Support: 1-800-961-4454
1-800-961-2888

Building The Cloud Monitoring Dashboard

This summer, we brought several interns aboard at our San Francisco Office (SFO). In this blog series, these interns share tales of their times as Rackspace summer interns.

As my internship at Rackspace San Francisco nears its conclusion, so does one of the most insightful and fun summers I’ve ever had. To say my internship was a learning experience would be quite the understatement.

Learning The Technologies

In my first few days with the company, I dove into an ocean of technologies I had never worked with before. On top of that, there was a massive codebase driving Cloud Monitoring I needed to become familiar with to start working on projects that would be thrown my way over the coming months. Rackspace’s Cloud Monitoring product is a conglomerate of different components that function together to create a robust and fault-tolerant system for customers to monitor their own products and services. Some components are written in Javascript, others Java and Python; even lesser-known languages like Erlang, Scala, and Lua also make an appearance. Needless to say, I had my work cut out for me.

The dashboard project had me working primarily with Javascript and the Node.js platform. It took me a little while to adjust to the myriad Javascript libraries available to me when making the dashboard. Libraries such as Async, Underscore and Express (just to name a few) were added to my rapidly-growing list of tools to become familiar with. Blueflood, a metrics aggregation system recently open-sourced by Rackspace, became critical to the operation of my project. I also spent some time working with Glimpse, Rackspace’s open source graphing library specifically created for use with dashboards, to create the charts that show the metrics. I also had to dabble in Chef, a tool for configuration management.

It wasn’t just getting up to speed with technology that was a challenge. Thinking about the new paradigms and problems before me was also quite the endeavor. Many of these problems involve the distributed highly-fault-tolerant nature of Cloud Monitoring. The metrics that power the dashboard would have to come from multiple servers in multiple datacenters, so the task would not be as simple as making a single request to get the data. Other problems like caching also became important, as fetching and rolling up metrics for every web request creates a serious computational overhead that would make the page load take far too long. Even UI design issues cropped up when designing the front-end for the dashboard. Whatever problems came up my mentor as well as other Rackers on my team and in the office were always there to help.

SFO Culture

The Rackers and the company culture here in San Francisco was certainly one of the highlights of my experience. The San Francisco office is a melting pot of brilliant people trying to change the company through innovation, and their passion for the products and services they create shows. Our bi-weekly team status meetings, where team members flesh out MaaS features and design-decisions, exemplify this passion. Oftentimes the team would enter the meetings with each person having a different idea of how something should be done, but after thoroughly deliberating the problem and its implications for developers, operations and the customer, a general consensus was always reached; even if it took a follow-up meeting. For an intern, these meetings were a way to learn how Cloud Monitoring works as well as gain valuable insight into new distributed computing concepts and technologies.

Eating Our Own Dog Food

Armed with five years of computer science education and a fledgling of knowledge surrounding the Cloud Monitoring project, I set out to create the Cloud Monitoring dashboard. The creation of the dashboard centered on the idea of dogfooding. Dogfooding, in the computer software world, means using the products you create (akin to eating the dogfood you give to your dog, in a strange sense). For the purpose of Cloud Monitoring, this means using our own monitoring product to monitor our monitoring product, a process we have deemed “monitoring-ception.” By monitoring ourselves, we show our customers the robustness of our system, as well as get some insight into areas that need improvement.

The dashboard relies on a few different components developed here at Rackspace. The Rackspace Monitoring Agent, a tool customers (and Rackspace alike) can use to monitor CPU, memory, disk and network usage can also be used to emit custom metrics to Cloud Monitoring. Blueflood stores and rolls up these metrics once they pass through the system. These, combined with a custom statsd backend to write performance metrics to disk, was used to emit performance characteristics from our Cloud Monitoring system to itself using Blueflood — a second dose of dogfooding. Once the custom metrics data makes it into Blueflood, it can be queried and rolled up using the Rackspace public API. The dashboard queries the API to get this data, which is then graphed over a few different time periods using the Glimpse library, the open source graphing library also developed at Rackspace.

The dashboard, as seen above, allows customers to view the status of as well as performance metrics related to the Cloud Monitoring product. The various performance metrics can be plotted over different time periods by clicking the tabs corresponding to each relative time-range. The status fields of the various Cloud Monitoring systems will change to alert users if something problematic is happening, or if a deployment is in progress in that region. Links to documentation, feedback, as well as the command line utility, are presented the top of the page to provide an easy way to access that information.

Seeing the dashboard evolve from a simple mockup as a hackday project into a full-on data-driven application has been an amazing journey. I can’t say it hasn’t been arduous at times, but I always had my team and mentor there when I hit a rough patch. It’s been a great summer here in San Francisco. I’ve learned a great deal, faced a lot of challenges and experienced working with some awesome Rackers (not to mention making a ton of connections). It’s a summer experience everyone should have, and few actually get, and I couldn’t be happier that I got to be here for it.

The Rackspace San Francisco Internship Program develops technical skills in interns while also supporting integration into the office culture. Want to join the team? Rackspace San Francisco is now accepting résumés for Summer 2014 Internships. Email your résumé to SFjobs@rackspace.com or join us at one of these career events: Oregon State University School of Electrical Engineering & Computer Science Senior Dinner October 23rd; Oregon State University Engineering Career Fair October 24th; UC Berkeley Engineering and Science Career Fair September 18th.

About the Author

This is a post written and contributed by Nathan Jordan.

Nathan Jordan is a summer intern at the San Francisco office working on the Cloud Monitoring product. He has a bachelors degree in computer science from University of Nevada, and is currently a second-year student in the masters of computer science program at University of Nevada.


More
Racker Powered
©2014 Rackspace, US Inc.