Support: 1-800-961-4454
Sales Chat
1-800-961-2888

Cloud Monitoring Adds Server Monitoring, Graphs And More

1

Last fall Rackspace announced the unlimited availability of Cloud Monitoring, our highly available API-driven monitoring system that is changing how we deliver Fanatical Support. Since then, we have been quietly adding features; and today we’re making those features available through unlimited availability, and we’re unveiling even more.

Server Monitoring: Check The Internal State Of Your Servers

An outage on a web server often isn’t necessarily a catastrophic event, but it is caused by a problem that could have been detected by watching internal statistics like CPU usage or memory. To help detect these problems, we’ve launched a new feature of the Cloud Monitoring platform: Server Monitoring.

Server Monitoring can be enabled by installing the Cloud Monitoring Agent, which allows you to create checks on load average, CPU, memory, filesystem and network usage, as well as more advanced checks (currently available via the API only) including a custom plugin check that uses a user-supplied check to gather any metrics you like. In addition to creating checks, which continuously gather data in the background, you can query certain information “live,” enabling visualizations on the server detail view in our Cloud Control Panel:

There is no shortage of monitoring tools available today; you may already be familiar with Nagios, Munin, collectd or other tools from the open source community and startups. Those tools can work well for some users that are familiar with them. Our new Cloud Monitoring Agent is different. It was built based on your feedback and includes several key features that you’ve asked for.

You said you need redundancy across multiple datacenters, so you can be confident that your monitoring system is up and running. And we know that you’re concerned about performance, so we’ve engineered our monitoring agent to have a small memory footprint. You want first-class support for both Windows and Linux, so we built the agent on top of libuv, the same cross-platform library that powers Node.js. Flexibility is also important, so we open-sourced our agent and offer custom plug-ins. We utilized industry-leading security standards, so you can use it anywhere without having to worry about the security of your data, and we hope you find it easy to manage with the familiar Cloud Monitoring API used in our remote monitoring solution.

Server Monitoring will be free to try out until July 31, and we’re excited to get more of our customers using it! See below for more technical details, or install the agent to get started!

Graphs

Each time a check is run – whether a Remote check or, now, a Server Monitoring check – Cloud Monitoring stores the data it collects. Starting today, we’re giving you the ability to view the history of your Cloud Monitoring metrics with a new graphing feature in the Control Panel.

For each Cloud Monitoring check you configure, we’ll show you graphs of some of the most useful metrics. You’ll be able to find out whether your server responds more quickly from London or Chicago, how much filesystem space you have left, whether or not you run out of memory during heavy traffic and more:

To get started with graphs, all you need to do is create some checks via our Control Panel. Expect to see more features in the coming months that will help you get more insight into your infrastructure. In the meantime, if the Control Panel isn’t enough for you, you can use the Cloud Monitoring Metrics API to fetch, analyze or display this data on your own. Get started with the documentation today.

PagerDuty Integration

Cloud Monitoring can now route alerts to PagerDuty. PagerDuty is a popular incident management tool, which handles alerting (via phone, SMS, email or mobile push), on-call scheduling and automatic escalation of critical incidents.

We use PagerDuty ourselves and love it, so we’re excited to let customers use this feature. Currently the Control Panel doesn’t expose the ability to add the details of your PagerDuty account to Cloud Monitoring, so you’ll have to use the API for that step. But once you’ve got it set up you can begin using PagerDuty just by selecting the appropriate Notification Plan from the dropdown when creating checks or modifying alarms. See our post on the DevOps blog for details.

Integration With Managed Cloud

If you’re a Rackspace Managed Cloud customer, the agent will be automatically installed on your servers when they’re created. When you configure a check, you’ll be given the option to route alerts to your Rackspace support team so we can handle the problem while you focus on your business. If you’re not a Managed Cloud customer but want to be, sign up today!

Multi-Data Center Redundancy

We’ve made a big deal out of the fact that remote monitoring runs in a multiple geographically separated data centers so that an outage in one data center doesn’t affect our ability to continue checking your infrastructure or alert you to failures.

In a similar way, the Cloud Monitoring Agent connects to three Cloud Monitoring data centers but only requires a connection to one to operate correctly. Even if Chicago and Dallas are attacked by Godzilla, your agent will continue to send check data to London and you will continue to receive alarms.

Small Memory Footprint

The agent is built to have a small memory footprint. We understand that a 512 megabyte Cloud Server is large enough to do many jobs, but if our agent was too large you wouldn’t be able to monitor these smaller machines.

To be specific, we built the agent to consume about 6 megabytes of RAM; around one percent of the RAM on our smallest Cloud Server.

Excellent Security 

To protect your data, the agent uses TLS for all connections and a private Certificate Authority (CA). This means that even if some public Certificate Authority incorrectly issues a certificate for our domain to a third party, like the recent TURKTRUST incident with Google.com, our agent will refuse to connect due to an untrusted certificate chain.

The Linux binaries that we provide are signed with a GPG key that you can download from our API server over SSL. Meanwhile, the servers that build and sign those binaries are on an isolated network free from the dangers of the open Internet. Similarly, we will be using Authenticode when the Windows agent launches.

On top of that, the agent does not have the ability to execute arbitrary commands; it simply gathers and reports metrics.

Open Source Software

The agent is an Apache 2.0 licensed open source project. This gives the agent a few distinct advantages:

  1. You can audit the code: We are proud of the engineering and security stance of our agent, but don’t take our word for it; dig into the code yourself.
  2. You are free to compile the agent for your own distro: We provide a large list of supported binary packages for Windows, as well as Linux distributions such as Debian, Ubuntu, RHEL, etc. But we understand that we won’t be able to provide binaries for every distro and architecture. If we don’t support your platform, you’re free to compile the agent yourself. With time, we are certain that the agent will be ported to routers and phones.

We look forward to seeing your pull requests on Github.

Custom Plugins

A number of agent check types are available out of the box for common system statistics and applications. But, these checks are only a starting point and can’t cover every conceivable use case. For example, say you want to get alerted when the number of rows in the session table of your database grows above 1,000 – you can do this with a custom plugin for the agent.

The interface for custom plugins is simple and straightforward. The agent simply executes a script that you place in the plugins directory and gathers the metrics from standard out.

Say you write a script that queries your PostgreSQL database for the number of active sessions, let’s call it session_metrics.py. All this script would need to do is query the database and output the following format:

status 235 active sessions in the database
metric sessions int64 235
metric oldest_session int64 50034

Now you can write an alarm when the number of sessions is over 9,000 and be prepared to scale your infrastructure for the additional load.

Familiar API

Entities, checks and alarms are the basic types in the Cloud Monitoring API. To fit into these types the agent only adds one additional property to the entity called the agent_id. So, if you already have an entity for your webserver, you can link a new agent to that entity and start allocating agent checks like the agent.plugin check right alongside your existing remote.ping and remote.http checks. Even alarm criteria are written in the same manner as usual.

if (metric['sessions'] > 90000) {
 return new AlarmStatus(CRITICAL, 'The database says the session table is over 9000!');
}
return new AlarmStatus(OK, 'The session table is not over 9000');

Try It Free

As we mentioned above, Server Monitoring is free to use until July 31, 2013. We look forward to getting your feedback and seeing cool uses of the custom plugins interface.

And as always, we are hiring.

About the Author

This is a post written and contributed by Russell Haering.

Russell Haering is the product manager of Rackspace Cloud Monitoring, and moonlights as an undercover engineer shipping new monitoring features. Since his days as at Cloudkick he has made a mission of revolutionizing how sysadmins monitor, deploy and manage applications, whether in the server closet or the cloud.


More
1 Comment

Russell,

Thank you so much for the excellent information. What will be the advantage of using RS cloud monitoring when they are other cloud monitoring services such as New Relic that offers the same for free?

Thank you in advance,
Cesar Torres
Orlando Fl

avatar Cesar on June 4, 2013 | Reply

Leave a New Comment

(Required)


Racker Powered
©2014 Rackspace, US Inc.