Client Login
Customer Support
United KingdomUnited Kingdom
BelgiumBelgium
DenmarkDenmark
GermanyGermany
Hong KongHong Kong
NetherlandsNetherlands
NorwayNorway
South AfricaSouth Africa
SwedenSweden
Dedicated Server, Managed Hosting & Web Hosting from Rackspace
Contact Sales
by Fran Stephenson on December 18, 2009

Incident Overview:

On December 18, 2009 between 3:37 p.m. and 4:12 p.m. CST, Rackspace experienced network connectivity problems.  The issues resulted from a problem with a router used for peering and backbone connectivity located outside the data center at a peering facility, which handles approximately 20% of Rackspace’s Dallas traffic.  The problems stemmed from a configuration and testing procedure made at our new Chicago data center, creating a routing loop between the Chicago and Dallas data centers.  This activity was in final preparation for network integration between the Chicago and Dallas data centers.  The network integration of the facilities was scheduled to take place during the monthly maintenance window outside normal business hours, and today’s incident occurred during final preparations.

Incident Timeline:

3:37 p.m. CST Rackspace detected customer impacting connectivity issues; backbone engineering begins troubleshooting procedures.

4:02 p.m. CST Backbone engineers confirmed source of issue as Rackspace router housed at a peering location outside our datacenter facility.

4:12 p.m. CST-Upon rerouting traffic from this device, all systems were back online.

Current Status:

Rackspace is operating under normal conditions, and we do not anticipate any further connectivity interruptions.

This issue interrupted our customers’ businesses, and for this we apologize.  If you have additional questions. please reach out to us.

Editor’s Update on Dec. 19, 2009 at 12:00 p.m. :  An FAQ on this incident.

Related posts:

  1. Rackspace Status: Network Up We had a network issue at a peering location outside...
  2. Dallas Road Show to Focus on eCommerce Making eCommerce more secure and trustworthy for customers is a...

Related posts brought to you by Yet Another Related Posts Plugin.

25 Responses to “Rackspace Network Status–Further Details”
Andre Marcelo-Tanner Says: December 19th, 2009 at 1:22 am

I think what everyone wants to know is how it will not happen again?

That’s is a lot of issues this year… Each time my business and my customers were impacted. This did not happen in the passed. I wonder what the root cause of all of these is.

Ryan Fanshaw Says: December 19th, 2009 at 1:26 pm

Why were we not able to get to any support services or get through on the phone to get answers? we pay enough money for solid web hosting, I would think the support system and/or the phone support would have been handled so we’re not resorting to twitter or google for answers, fortunately google’s twitter integration helped, but we shouldn’t need to look for answers elsewhere considering the money we pay for “fanatical” support.

communications international Says: December 19th, 2009 at 2:28 pm

Just wanted to say think you for the quick rack twitter updates during the network problem on the 18th. I was on an important conference call so could not hangup, and when my email stopped working and I could not view websites, I was cut off from any ticket updates and also worried and stuck on the conference call. However, the quick Rackspace twitter report let me know that there was a problem and that rackspace was working on solving it. Rack’s quick twitter response saved me from having to call rackspace to inquire about the problem. Thank you! -k

I’d welcome a FAQ answer about what “final preparation” changes are generally (or were) made prior to maintenance windows.

Obviously any change can encounter a router OS bug, but the detail Rackspace released makes this sound more like a meaningful, and potentially service-affecting, routing change — adding an IGP advertisement, changing a route selection metric (ie, route map), or something similar.

I’d appreciate enough technical detail to evaluate this. What changes were made, and what considerations led to making them prior to the maint window?

This doesn’t really explain what happened at all. How did the entire cloud go down for 45 minutes?

The fact that Rackspace not only didn’t inform customers of this outage with anything but a general bulletin buried inside the Rackspace portal, but also SUPPRESSED ALL NOTIFICATIONS FOR OUR SERVER MONITORING SERVICES is absolutely outrageous! We had no support ticket, no email or no phone call notifying us of the outage. Instead, the outage was brought to our attention by one of our own customers! This is NOT ACCEPTABLE and a complete failure by Rackspace. Communication is KEY to handling these problems quickly and with the least amount of disruption, when is Rackspace going to learn that? Suppressing our server monitoring notifications IS NOT FANATICAL SUPPORT, it is in fact sabotaging our businesses!

Don’t try to minimize this issue. Our eCommerce website went down during the busiest day in our company’s history. While it wasn’t the entire internet, it was the most important part of it for the 21 people who depend on the site running to generate their paychecks.

Further, this is the third time this month that our site, or parts of our site, have been disabled by rackspace issues. For what rackspace charges, this should NEVER happen. But mistakes that happen at the most critical times for websites have actual dollars tied to them.

That is nothing to brush off as just a simple outage.

What sort of Service Management applications are deployed for RS datacenters?

What sort of FCAPS applications are running in RS datacenters?

What happened to redundancy and HA solutions?

2nd time this year!!

NMS/SML and Infrastructure need to get together and have a strategy.

You guys really need to work on your coordination over there. There has been one issue after another. How can you expect us developers to continue recommending you, when you’re either down, or we’re getting “No suitable nodes” errors?

Do you realize how that looks to a developer? Holy crap. “No suitable nodes”?, on a Cloud computing model? that is supposed to be backed by fanatical support? You’re making yourselves look bad! And worse, you’re making me look bad!

Did anyone remind you guys it’s Christmas! People MUST have absolute confidence that you’re not just going to be online 100% of the time, but that you’re going to be ready for huge traffic spikes as well. Things like this don’t add to our confidence level.

I’m still a fan, I can see you’re trying. Please try harder. I want the stability back!

Hey, rackers, good job on solving the problem. Nobody is perfect. keep your head up

My question is where was change control and customer notification? Changes to production routers outside of an announced window. The routers should be isolated, changed and evaluated – not modified in production, even if just to insert but not commit a change.

[...] Yesterday afternoon, we reported that Rackspace had suffered a somewhat large-scale outage, leaving many sites down for the better part of 35 minutes (from 4:37 PM Eastern to 5:12 PM Eastern). In that report, we did not have much (if any) information with regard to the outage, but Rackspace has now explained the reason for outage in a blog post. [...]

Mike Eggermont Says: December 20th, 2009 at 9:21 am

30 minutes down – first time in 2 years as a client: well within 99.99% uptime. Still best service anywhere.

Julian Dormon Says: December 20th, 2009 at 6:43 pm

Blah Blah Blah. Where’s my refund? and WTF are you going to do in the future to prevent this from happening again?

[...] out. Hi atomicwedgie, Are you still experiencing problems with time-out? Our host, RackSpace encountered some problems and hence that could have led to the downtime. If you still have problems, feel free [...]

[...] to outages, but a company like Rackspace needs to provide consistent and reliable service. The Official Rackspace Blog explains “On December 18, 2009 between 3:37 p.m. and 4:12 p.m. CST, Rackspace experienced network [...]

Are we going to get some real answers here? Is someone going to explain to us why the systems has gone down multiple times this year. I have hosting at places I pay $50 for the year and there systems is up more than yours. I’m very frustrated and want some real answers. what type of agreements do you have with your peering partners? Do they promise the same 995 uptime that you do? If they don’t you had better adjust your promises.

Good to know that Rackspace can be almost completely brought down by one bad router configuration.

This is the last straw. My quest for a new hosting provide starts today.

If you had been running JunOS by Juniper Networks, you could’ve run ‘commit confirmed’ and let it automatically roll back after X (10 by default) minutes of not working. Give time for route convergence, and your outage probably would’ve been 11-12.. minutes instead of 35 (assuming you used the default 10 minutes for a rollback time). Time to ditch the cisco junk and go with a real telco class router platform.

I will never touch cisco again. They make garbage equipment. Sorry, if I’m jaded. I’ve had way too many issues with cisco.

Kevin Duffey hit the nail on the head -

“My question is where was change control and customer notification? Changes to production routers outside of an announced window. The routers should be isolated, changed and evaluated – not modified in production, even if just to insert but not commit a change.”

This is the kind of thing that’s supposed to separate Rackspace from lower tier providers. We were told Rackspace has processes in place to prevent this kind of thing from happening.

[...] PM EST. The problem was due to router issues. Rather than copy and paste what happened you can read this post and see what happened. Please feel free to contact support if you have any questions or [...]

Blah, Blah, Blah… at these prices… WTF Rackspace? Routers…Routers… Service Messages… We don’t need no stinkin’ routers… or customers… Looking for new diggs…

What absolutly bothers me the most about this was the fact that the new implimentation was being done at a time when all eCommerce sites are at their busiest.

That kind of change should be done when there is the least impact predicted to customers, meaning slower months or slower times. Not only did this occur during the busiest month of the year for eCommerce sites, but also right in the middle of prime time on a Friday afternoon.

That is absolutly extremely poor planning on the part of RackSpace and brings into question just how much are they focused on customer support.


You can subscribe to this feed via RSS to receive updates when this content changes.

For more articles, click here to view our Press Releases.

For more articles, click here to view our News Articles.
  • Cara Nichols: Blog post regarding Cloud Servers for Windows: http://www.rackspacecloud.c...
  • John: Just watched the 1st SharePoint Webinar and loved it. Thanks for all the great hosting news!
  • Cara Nichols: Hi Mark~ We are currently working on pushing Rackspace Cloud Servers for Windows beta into production...
  • Mark Germanos: When will the cloud servers running Windows be promoted from BETA to production?
  • Cara Nichols: Hi Steph~ This is really centered on Austin but you may check to see if anyone in the Michigan area has...

Rackspace Dedicated Exchange Experts, David Eisenstein and Jerry Schwartz, shed light on the advantages of a Dedicated Exchange environment.
Watch David & Jerry's Video

Looking for the computing power of a dedicated server combined with the best support and SLAs in the industry?
Click Here to Configure & Buy Online