At Glassdoor, we provide a frank and honest look at jobs and companies for employees and employers – from salaries and reviews to interview questions and rating the best places to work; we collect a lot of data. We’re an intelligence gathering business, of sorts.
One thing we’ve learned along the way is that data is great; but it is relatively useless if you can’t learn from it and act on it. We saw this first-hand recently when we were bringing up a new NoSQL cluster in the cloud and it just wasn’t performing the way we had hoped.
To us, it made no sense to buy the five physical machines to build out this cluster. It would be too expensive, we’d be limited in scope and scale, and we wanted the ability to bring up extra servers as needed without the delay of ordering new hardware. We’ve been a Rackspace customer for roughly five years, so we opted to run it on Rackspace Next Generation Cloud Servers. Since this was new technology for us we wanted to make sure we were doing it correctly.
We turned to Rackspace Cloud Intelligence, a new data visualization platform that analyzes our Cloud Monitoring data and gives us both insightful and actionable information about our servers to help us manage the deployment. We can use Cloud Intelligence to detect anomalies and recognize patterns by either comparing the same metric across multiple servers, such as average network receive rate, or by comparing multiple metrics across the same server.
Immediately, we knew something was amiss. The cluster just wasn’t performing under load as we had hoped. We used Cloud Intelligence to compare metrics across all five servers in our NoSQL cluster to track down the issue. We started by looking at the CPU activity across all five machines in our NoSQL cluster. It was consistent across all the servers, and none of them were being stressed. We next looked at the disk activity, and it too was consistent and reasonable. Finally we looked at the network traffic. It was consistent, but with much lower bandwidth than it should’ve been. The network I/O was not up to snuff.
This was right before Rackspace Performance Cloud Servers became available. As soon as they launched, we migrated over to these higher-powered cloud servers and again used Cloud Intelligence to see how they performed. Instantly, we noticed that our network utilization had just about doubled.
But that wasn’t the only upshot to the migration. With Performance Cloud Servers, we were able to run fewer CPUs and higher memory, at about the same cost. We were getting more bang for our buck.
Cloud Intelligence and Performance Cloud Servers have become critical components of our hybrid cloud architecture, which lets us keep our memory intensive pieces on dedicated hardware while running our web app on a flexible and scalable cloud.
And we’re still able to keep tabs on CPU and network utilization, while combing through historical data to get a stronger understanding of our environment. Cloud Intelligence also gives us a great way to test the impact of launching a new application.
For us, Cloud Intelligence helped pinpoint the bottlenecks in our NoSQL cluster and it made it easy to get the correct data when we needed it and take immediate action. It was a huge timesaver. The alternative would’ve been to dump all of the data into a spreadsheet and generate charts and graphs; and none of us have time for that.
This is a guest post written and contributed by Barry Klawans, software architect with Glassdoor, a Rackspace customer. Glassdoor is a fast growing jobs and career community that is leading the way in workplace transparency. Glassdoor helps people find jobs they love and helps employers find top talent.