Support: 1-800-961-4454
Sales Chat
1-800-961-2888

Dallas-Fort Worth Data Center Update

78

_____________________________
Message from Rackspace CEO Lanham Napier
July 9
, 2009

Rackspace Community,

Some of our customers have been directly affected by recent outages in a portion of our Dallas-Fort Worth Data Center. Others of you may have heard about it and are following it closely.  An interruption like this is not up to our Fanatical Support standards and we are working hard to prevent such incidents from occurring in the future.

On behalf of Rackspace, I sincerely apologize for these disruptions.  We know these failures negatively impacted the lives and businesses of our customers.  After the disruptions occurred we did our best to recover quickly and explain what happened in a transparent fashion.  Since we have seen erroneous and incomplete information on the web and in the media, we wanted to share with you with the most up-to-date and accurate information.

WHAT HAPPENED

First, some context.  Our DFW Data Center has three phases, or sections, and the outages were caused by a malfunction in our power infrastructure in Phase 1.  We have redundancy in place, and this redundancy generally works as intended, but these outages show that we clearly have room for improvement.

While we take any outage seriously, it is important to know that this is not a pervasive issue across Rackspace. We operate nine facilities worldwide and the problems in DFW are not affecting our other data centers.  Unfortunately, these localized incidents in DFW have had a disproportionate impact on some customers.

Here’s a quick recap of the outages and near-term resolution activities:

  1. We had a power interruption on June 29, 2009 in Phase 1 of our DFW Data Center, and we moved some of our customers to generator power.  The generators then experienced a failure, which caused those customers to lose power to their servers for approximately 40 minutes.  We have since performed maintenance and upgrades to those generators, with the help of experts from companies like Cummins, GE and Eaton, and the generators are now stable.
  2. We experienced another power interruption on July 7, 2009.  Again, we moved customers to generator power.  During this outage we also suffered a loss of network connectivity due to the power disruption.  The part of the power infrastructure that failed (a “bus duct”) prevented proper operation of our UPS for that section, so some customers lost power to their servers for about 20 minutes before we could get them onto generator power.  We have since replaced the failed bus duct, and that section of the data center is back to normal and running on utility power.

If you would like more detailed information on the June 29 interruption, please refer to the June 29 Incident Review.  The Incident Report for July 7th is forthcoming. I also wanted to speak to all of you in some way other than anonymous copy on a screen, so this morning, I recorded a video which follows this letter.

WHAT WE’RE DOING ABOUT IT

The above resolution steps address the near-term issues.  Now we are digging into the actions we need to take to prevent these types of outages in the future.  Let me be clear:  data centers will experience power interruptions, parts will break, and servers will go down.  No data center is completely risk-free.  But we can manage and mitigate the risk to acceptable levels, better than we have today, and we can make sure our recovery is as quick as physically possible. I have no doubt that we will get better and stronger from this situation

Our main actions include the following steps:

  1. Put our best people on it, and bring in the experts. I am personally going to locate myself in our DFW data center until I am satisfied that our repairs and maintenance are complete.  We have assembled our best talent from the US and the UK to focus on the issues there.  And we have brought in top talent from our vendors, as well as knowledgeable outside consultants, to assist us.
  2. Assess the status of the infrastructure. We are combing through the power systems in DFW and assessing every link in the chain.  Based on the advice of our experts, we will update every piece that needs updating to ensure the performance we require.
    a.    Phase I has four zones within it.  At this point we have completed work on the major power systems for each zone by remediating known deficiencies at the generator and UPS levels.
    b.    We will continue our work through the smaller components of each zone including switches, breakers and ducting.  At this point we have completed all of the work on the smaller components for one of our zones and preventative maintenance on the other three zones is underway.
    c.    We will complete all of this work as soon as possible with minimal disruption to customers.
  3. Improve standard operating procedures. We are going to increase the frequency of our testing, monitoring and measurement programs within DFW.  Our maintenance schedules will change.  And the level of detail we review internally and share externally will increase.
  4. Invest. We will continue to invest in our infrastructure.  We have invested more than $50 million in DFW over the last two years.  We invested some of this money in expansion, some to improve our networking and cooling infrastructure, and now we will spend more to improve the capability of the power systems.  We will also invest in additional information systems as appropriate to support our new measuring and management procedures.


THE RACKSPACE FANATICAL SUPPORT APPROACH

I would also like to share our Fanatical Support philosophy regarding any downtime or outage situation.  Here’s what you should know about how we act:

  • Our first priority is getting customers back up.  This priority takes precedence over everything else.  Customer uptime is a core principle of Fanatical Support, and if you have much experience with us, you know that we take Fanatical Support very seriously.
  • We pledge to be transparent.  We will do our best to communicate what we know when we know it, and to keep customers and the broader Rackspace community informed.  We understand our role in running the Internet, and we know that any missteps ripple out beyond our customers.
  • We will fix the problems in a way that minimizes customer disruption.  When we experience a disruption or outage, our root cause analysis identifies fixes that improve redundancy and stability.  We then undertake these fixes during maintenance windows, or we utilize other ways to prevent customer impact (such as running on generator power during a utility fix).  Sometimes, as in the July 7th outage, we experience an additional outage before we have had a chance to completely diagnose and repair all parts of the infrastructure.  Note that in the case of DFW, we are confident we have stabilized the power infrastructure, although we will continue to be hyper-vigilant in monitoring and responding to any irregularity.
  • We will honor our Service Level Agreements.  We think we have the best SLAs in the industry, and we will not hesitate to make it right with our customers when there is a disruption.  We will stand and honor our Fanatical Support Promise to our customers.

As always, your feedback is welcome.  Please be honest with us about your expectations and how we can do a better job for you.  Fanatical Support is in our blood, and times like these are character tests for us.  We will do our best to restore your trust in us.  I want to thank you, our customers, for standing by us, as well as our Rackers for their tireless efforts to deliver Fanatical Support.

Lanham Napier
CEO, Rackspace Hosting


_____________________________

Bus duct installation complete: Dallas-Fort Worth data center status  * July 9, 2009, 1:45 am CDT:  We have replaced the bus duct and successfully returned to utility power on UPS cluster A. The transition started at approximately 11:30 p.m. CDT and was completed at 1:45 a.m. CDT. The DFW data center has returned to normal operating condition.

_____________________________

Status of bus duct installation * July 8, 2009, 6:00 pm CDT:    The new bus duct is en route to our Dallas-Fort Worth data center with an expected arrival time between 7:00 pm and 8:00 pm this evening. As soon as we receive the bus duct, we will begin installation and testing – a process which will take approximately 5 hours to complete. We expect to transition from generator to utility power with UPS support on or about 1:00 AM CDT July 9.  We will provide additional updates should the schedule change dramatically and upon successful transition to normal operations.

Status * July 8, 2009, 11:55 am CDT: Early this morning, we completed the installation and testing of a temporary bridge that carries power from UPS cluster A to the power distribution units and the cabinets and servers. This temporary bridge is part of the two-phase bus duct replacement process as noted in our July 7th 8:00 pm update.

The second phase of the replacement is the installation of a new bus duct. The new bus duct is being manufactured for us and will be flown in for installation. We expect to receive the bus duct tonight and will immediately begin installation and testing.

Servers supported by UPS cluster A continue to run on generators, which are running reliably and predictably. If necessary, we can switch to UPS and utility power using the temporary bridge. We have experts from our vendor onsite and available to assist with generators as needed.

We will notify customers once we successfully complete installation and testing and before we return these servers to utility power.

_____________________________

Overview and status * July 7, 2009, 8:00 pm CDT: Today, in our Dallas-Fort Worth data center, a part failed causing power interruption and network issues to a portion of the data center.  As of 8:00 p.m. CDT, a portion of the data center is running on generator power, and after we have replaced the failed part, we will move that portion of the data center back over to utility power.

Specifically, the part that failed is called a bus duct, which is composed of straps or tubes of metal used to conduct large amounts of electricity.  Because a data center consumes substantial amounts of electricity, bus ducts are commonly used in the power infrastructure.  In our Dallas-Fort Worth data center, the bus duct failure caused downtime for customer servers that are supported by UPS cluster A. There were also intermittent network performance issues for customers in sections supported by UPS clusters B and E as well.  We are still in the process of determining why the bus duct failed and why customers experienced downtime as a result of this issue.  Customers supported by UPS cluster A are currently being powered by generators, which are running reliably and predictably.

The bus duct replacement is underway and, when complete, will allow us to switch back to utility power.  This replacement comprises two stages, a temporary fix and a permanent fix. The temporary fix will allow us to switch back to utility power if we have any issues with the generators, although we plan to continue to operate on generators for the time being.  The permanent fix will use a production part and allow for a permanent switch back to utility power. Customers supported by UPS cluster A should not experience any disruption during this repair work, and we will notify them in advance of the switch-overs.

We realize that although we were able to restore power within minutes, some of our customers were adversely affected and for this we sincerely apologize.

We appreciate your patience and will continue to provide updates as we have new information available.

Status * July 7, 2009, 1:30 pm CDT: Today at approximately 11:00 AM, an electrical connection failed, causing a brief power interruption to customers on UPS cluster A.  This failure also may have caused intermittent network performance issues for customers supported by UPS clusters B and E for a short time.

For cluster A customers, we bypassed the UPS and restored power to the servers via generator within a few minutes.  Currently systems supported by UPS cluster A are still running on generator power.  Repairs are underway and we plan to return to utility power with UPS support as soon as possible.  We will follow up with additional updates as new information becomes available.

Update * July 7, 2009, 12:04 pm CDT: We’ve received some questions about whether or not this was a network or power interruption.  To clarify, the network issue was related to the power interruption.

Our Dallas data center experienced a network interruption which may have caused a brief loss of network connectivity to some servers.  We apologize for any inconvenience this may have caused you and your business. We appreciate your patience while we work through this issue.

If you are a customer, please continue to use our customer portal (https://my.rackspace.com/) and this blog http://www.rackspace.com/blog/ for information.

Notice * July 7, 2009, 11:44 am CDT: Today a portion of our Dallas data center experienced a brief power interruption. Rackspace is aware of this issue and is currently investigating it. We will be sending out periodic updates as more information becomes available.

_____________________________

Status * July 3, 2009, 6:59 am CDT: We have successfully completed the July 3, 2009 scheduled maintenance on both the A bank generators and the utility breaker.  During this maintenance window, we performed the production load test of generator bank A and confirmed that we have eliminated the excitation failures that caused recent customer disruptions.  We have returned the DFW data center to normal operating conditions.  We will follow up with additional information as necessary.  Thank you for your continued patience throughout this process.

_____________________________

Status / Scheduled Maintenance *July 2, 2009, 2:35 pm CDT: We are continuing to research and troubleshoot the root cause of the power interruption in our DFW facility.

As part of our work to improve the reliability and performance of these areas of the data center, we have scheduled a maintenance to generator bank A on Friday, July 3, from 12:01 a.m. to 6:00 a.m. (CDT). Customers who are supported by this generator bank have been notified of this maintenance.

Also, as a cautionary measure, we have asked Oncor, our power supplier, to perform preventative maintenance on their utility breaker during the same maintenance window. Oncor believes this breaker maintenance to be low risk and will be accomplished in less than 30 minutes. However, it requires that we place all customers in phase 1 and 2 of the data center onto generator power.  This means that in addition to placing customers on generator bank A onto generator power as planned, we will also place customers supported by generator bank B onto generator power for a brief period during the July 3rd, 12:01 AM to 6:00 AM CDT maintenance window, after the generator bank A maintenance occurs.

We believe no customers will be impacted but want to provide this update to our customers. If you are a customer and have questions, please contact your support team by visiting http://my.rackspace.com or by calling 888-480-7640 or 0800 587 2306, +44(0)20 8734 2700 (UK).

_____________________________

Status * July 1, 2009, 2:45 pm CDT:  We are continuing the diagnosis activities to determine the root cause of the interruption. We conducted tests last night on the generators in question and believe we are making progress in understanding what caused the interruption. We have our suppliers and external consultants onsite working with us on this process. We will continue to provide status updates as we learn more.

If you are a customer, please continue to use our customer portal (https://my.rackspace.com/) and this blog http://www.rackspace.com/blog/ for information.

_____________________________
Message from Rackspace CEO Lanham Napier
June 30, 2009

Rackspace community,

Yesterday afternoon at 3:15CDT our data center in Dallas experienced an interruption in power to portions of the facility.  The interruption caused customer servers to lose power and go down.  We sincerely apologize for this disruption and know that it impacted our customers’ businesses as well as the experience of many who use the web.  Although we have had some issues with this data center before, please know that we will do what it takes to improve its reliability and performance.  We owe you an action plan to prevent this type of thing in the future, and we’ll get that to you as soon as it is ready.

Specific to this situation, here’s what we are doing right now:

  • The data center is currently running on utility power.
  • We are continuing to research the root cause analysis for yesterday’s generator failures.  We have flown in our senior-level engineers from our global operations, and they are working with our external suppliers to determine the cause and how we can prevent this from happening again.  We have the best outside experts from companies like Cummins, GE and Eaton.
  • We have re-serviced and re-checked our UPS units.
  • Tonight at 9:00CDT we will continue our testing of the generator bank in question as we narrow down the variables to determine and remediate root cause.
  • Our Support teams will continue to work with all affected customers to ensure they’re up and running.
  • We will continue to provide status updates on our customer portal (https://my.rackspace.com/) and on http://www.rackspace.com/blog/.  A copy of the incident report that we sent to affected customers can be found at the following link. Though we typically treat our incident reports as proprietary information between us and our customers, we are publicly posting the report for this incident due to high level of public interest that this incident has received.

I want to ensure you that we are doing everything we can to bring this to resolution as quickly as possible.  We appreciate your support and understanding.  Our promise is Fanatical Support, we believe in it, and we will work with each of our customers to honor that promise.

Lanham Napier
CEO, Rackspace Hosting

_____________________________

Overview and status * June 29, 2009, 11:26 pm CDT

This afternoon our Dallas data center experienced power interruptions that caused downtime for a portion of our customers. These power interruptions were the result of a range of power infrastructure issues. Right now, the Dallas data center is stable and running on utility power. Our UPS units have been re-serviced and re-checked as of this evening, and we are in the process of doing the same with our generators.

We don’t have a lot of details on exactly what happened yet. When we have an outage, our first focus is on fixing it and getting customers online as soon as possible. Now that we have the near-term situation stabilized in Dallas, we have some work to do to improve our reliability. We will follow up with more information as we work through our root-cause analysis.

Although this outage only affected a portion of our customers in one of our nine global data centers, we consider any outage to be unacceptable. We sincerely apologize to our customers and those who were affected by this downtime. We didn’t serve you as well as we should have today. We are dedicated to Fanatical Support and providing world-class hosting to our customers. Rest assured that the entire Racker family is dedicated to determining exactly what our failures were, and how we can correct them. Thank you for your support on the phone, blogs, Twitter and other forums.

Status * June 29, 2009, 8:58 pm CDT: Section A of the Dallas data center is now back on utility power, and maintenance work on the UPS for that section is complete. No customers were impacted in this transition. We’ll provide further updates as information is available.

Update * June 29, 2009, 7:28 pm CDT: A prior update indicated that utility power was serving the entire data center. However, that update was incorrect in describing the status of one section of the data center (Section A), which is currently still running on generator power while we finish some work with the UPS for that section. After that work is complete, we will transition Section A back to utility power. Customers in Section A are stable while running on generator power, and we are taking every precaution in transitioning this section of the data center back to utility power. We will provide further updates as they become available.

Status * June 29, 2009, 5:55 pm CDT: The Dallas data center is now fully back on utility power. We’ll continue to provide updates as information is available.

Status * June 29, 2009, 5:30 pm CDT:  Power has been restored to affected devices.  However, some of the devices need to be manually brought back online, and this process is underway.  The data center is currently running on a combination of generator and utility power.  We apologize for the inconvenience this may have caused you or your customers, and more information will be presented as soon as it is available.

About the Author

This is a post written and contributed by Lanham Napier.

Lanham Napier was the CEO of Rackspace from 2006 to February 2014, and now serves as a consultant to the company's leaders. As CEO, he avidly promoted the workplace culture that drives the company’s famed Fanatical Support®. This passion for empowering customers has made Rackspace the acknowledged leader of the open cloud.

Napier joined Rackspace in April 2000, a couple of years after its founding. He recognized early on that the company’s employees—dubbed Rackers—and its unique culture of customer service, would be the keys to its success. Napier strove to give Rackers the same exceptional support that they provide to the company’s customers. That approach has won Rackspace recognition by FORTUNE® magazine as one of the 100 Best Companies to Work For, and by Bloomberg Businessweek as a Top 100 Performing Technology Company.

Click here to learn more.


More
78 Comments

C’mon Rackspace. You can do it! I’m hoping this is more than an small oversight on the data center power redundancy plan :)

avatar Scale My Site | Cloud Hosting on June 29, 2009 | Reply

Since this is the third power incident in DFW in the past month, what assurances can we have that the team understands the problems and can deal with them?

One time, sure… Two times, maybe. Three times? At the minimum it points to some cut corners and bad practices.

Been a customer for a long time but this is a serious bummer. I’ll be sweating through my entire vacation wondering if RS is gonna go down again.

avatar Hunter on June 29, 2009 | Reply

Dear Rackspace,

Your lack of redundancy for your own internal operations is highly concerning. Last time I checked Fanatical support required a phone system that works!

This may be the straw that breaks the camel’s back when it comes to hosting with you. You are already significantly higher priced than comparable competitors. The only reason we have stayed is because your data center was supposedly more reliable due to your size…now I’m not so sure.

avatar jpmuofu on June 29, 2009 | Reply

Please update your blog more regularly particularly if this is the recommended way for receiving updates. We are sweating over here and need current info, even if it is only updating your steps in the process.

avatar Anonymous on June 29, 2009 | Reply

Ever heard of a generator?

avatar Ann on June 29, 2009 | Reply

The best place to get regular updates will be within the customer portal (www.myrackspace.com). In addition, we’ll continue to provide updates on this blog however.

Bryan

avatar Bryan Urioste [Racker] on June 29, 2009 | Reply

Uh oh! What happened to the generators? This makes me nervous.

avatar MedChoice Financial Admin on June 29, 2009 | Reply

Hey Bryan, thanks, but nothing but clients yapping there. I can only find content at Community Discussions. No staff comments. No different or better info than here.

avatar Anonymous on June 29, 2009 | Reply

You thought Rackspace would have learned how to deal with keeping customers updated from the outages in November 2007. Guess not. Trying to convince us you’re different, you’re better and you’re worth the premium price. Sorry, you dropped the ball again. ThePlanet had an explosion at their DC in May 2008 providing the same lame updates but they were not nearly as expensive. I think Mr. Napier was quoted as saying he was snake bit by DFW last time this happened a year ago. Now where do I go? Getting bids tomorrow – migrations hurt but being off-line hurts even worse.

avatar doubter on June 29, 2009 | Reply

The latest DC update has been made on the blog and within the customer portal. More specific updates about your configuration will come through as a message in the customer portal under the “Accounts” menu.

Bryan

avatar Bryan Urioste [Racker] on June 29, 2009 | Reply

Ditto

avatar Anonymous on June 29, 2009 | Reply

I would like to provide our customers with a more concrete update about when we will be back online. Can you provide a worst-case answer?

avatar Geoff on June 29, 2009 | Reply

THERE ARE NO UPDATES ON MYRACKSPACE.com

My box has been down for almost 3 hours and no updates!

avatar Matthew C on June 29, 2009 | Reply

I’ve been back up for a while I would just like to know what happened.

avatar Justin on June 29, 2009 | Reply

Our web server is up, but our database server is still down. Any updates???

avatar Jesse Myer on June 29, 2009 | Reply

still down 2.5 hours and counting here

avatar Anonymous on June 29, 2009 | Reply

I need an ETA asap. Best Case/Worst Case scenarios…

avatar Mike Kujawski on June 29, 2009 | Reply

Out sire is STILL down and we are several hours into this. Any update as to when the “manual” handling will be complete.

avatar David on June 29, 2009 | Reply

Finally we are back up! Not sure I understand where the “back up” was when the power went down?

avatar jeanne on June 29, 2009 | Reply

Same as David – site is STILL down and we are now several hours into this with NO ETA on when we will have a site again. Customer Support just keeps saying we are “somewhere in the queue” but can’t tell us where, who’s managing the queue, etc.

avatar sylvia on June 29, 2009 | Reply

So, 3 power related outages in 30 days. 100% -guaranteed- up in smoke. Should we be expecting a refund on the fees we’ve paid this month? Deal on next months? Or is this just going to be swept under the rug?

avatar thebouv on June 29, 2009 | Reply

rdbeach Send your account number to twitter@rackspace.com and we will investigate ASAP.
—————
This is really insulting considering how many times I’ve called with no updates.

avatar Matthew C on June 29, 2009 | Reply

[…] Update @ 4:37 pm: Rackspace has just updated their blog with some more information but they haven’t provided any more details on the root cause yet: . […]

avatar Online accounting software news from Xero » Blog Archive » Unexpected outage on June 29, 2009 | Reply

As a customer, I appreciate the updates but this sounds too corporate for me.

We know you had an update. We know you thought it sucked (BTW, we did too).

We hope you’re going to look into it.

I see zero acknowledgement in the above post that the DFW data center has had power problems for weeks.

Come on guys! Just lay it out for us without the corporate PR speak.

avatar Hunter on June 29, 2009 | Reply

By ‘update’, I meant ‘outage’.

avatar Hunter on June 29, 2009 | Reply

Unfortunately it’s the same portion of customers that have had this happen 3 times in the last few weeks. For the 2nd time in about 8 days, the power issues resulted in a hard shutdown (read: pulling the plug) of our server and corruption of our site’s database (mySQL *hates* being turned off uncleanly even with InnoDB tables). We’re at nearly 10 hours of downtime this time (and counting). I think we were down 8 hours last weekend.

We want a real explanation of what’s going on and an absolute assurance that this will not happen again.

avatar John T. Haller on June 29, 2009 | Reply

@ Matthew C. – Sorry that offended you. Not every customer on Twitter has been to the portal/tickets/phones first.

Didn’t mean it to sound like a short cut – it is just another way we are trying to serve customers.

Hard to tie Twitter names to account #’s.

Rob La Gesse
Director of Customer Development
Rackspace
210-845-4440

avatar Rob La Gesse on June 29, 2009 | Reply

You failed. Bigtime.

avatar whatever on June 30, 2009 | Reply

W switched from Dreamhost who had constant outages and horrid customer communications.

We came to Mosso for the stability offered.

What happened to your generators and UPS? If you fail to maintain and test these devices regularly, why bother having them?

avatar Roy Kamen on June 30, 2009 | Reply

One more question – was this outage due to lack of manpower? How many of your staff is out sick?

avatar Roy Kamen on June 30, 2009 | Reply

Thanks for the update. Looking forward to learning more about the root cause.

avatar Michael Lehmkuhl on June 30, 2009 | Reply

At least y’all kept us updated, which is more than I can say for quite a number of other hosting providers that I’ve dealt with. I look forward to the RCA … we’ll all learn how to fail over a little more gracefully the next time that this happens (failure is inevitable, but total failure doesn’t have to be).

avatar Nicholas Piasecki on June 30, 2009 | Reply

This is a total PR response but these things do happen. It’s not about WHAT happens but what measures are taken to make sure it doesn’t ever happen again. We lost power (we’re located in Dallas as well) and when our generators tried to kick on there was air in the fuel tube from the recent change in temperature and it didn’t get gas! Luckily we had the right engineer at the DC and he fixed it 5 mins before our batteries were going to be out. We got lucky to be 100% honest. But now we added a pressure device on the fuel line – our generator company didn’t recommend when buying. Lesson learned and we’re better because of it.

Just wanted to throw out those two oo’s. It’s not happens but what the company does afterwards.

avatar Chris Drake on June 30, 2009 | Reply

[…] further investigation, the Rackspace blog starts off with this post about the developing problem on Monday. We have experienced an interruption in power to a portion […]

avatar Rackspace DFW Downtime Datacenter Outages | Dedicated Server School on June 30, 2009 | Reply

Why did an excitation failure affect multiple gensets, or were the UPS clusters fed from a single generator?
Excitation, for the benefit of those who don’t know, is the process of establishing a magnetic field within the alternator of a large genset. Hence, excitation failure shouldn’t affect multiple sets, and I would have expected a company such as rackspace to have a n+1 synchronised setup.

avatar Alex Threlfall on June 30, 2009 | Reply

This is really terrible. I go from a $7/month host for 4 years with no problems to $100/month with several outages, too much downtime and a file manager that doesn’t work for several weeks. This is really terrible.

avatar TH on June 30, 2009 | Reply

Can someone reccommend a reliable host?

avatar TH on June 30, 2009 | Reply

Of course you would moderate the comments. How cowardly. Why don’t you take the feedback as deserved and respond accordingly. That is what company’s with integrity do.

avatar TH on June 30, 2009 | Reply

What I think is missing here in this post is why Rackspace wasn’t ready to deal with such a failure by simply shifting traffic for hosted sites to your backup facility. People can understand momentary outages but when sites are down for more than an hour it is beyond unacceptable.

avatar Mike Langford on June 30, 2009 | Reply

You suck Rackspace. You were down in a major way back in Nov. 2007. Here we go again. Time to look for another provider.

avatar J C on June 30, 2009 | Reply

In response to Alex Threlfall:

We absolutely have a set of 4 generators that are paralleled to support a number of UPS’. These are at N+1 (we need 3 of the four to be online to hold the load.)

What we saw yesterday was a situation where the generators started fighting with one another on the bus. They were unable to get properly synchronized. Eventually, they failed in a cascading manner and we lost all of the generators. Technically, each gen showed an excitation fault. But, it was really the inability to get synchronized that created that fault.

Troy

avatar Troy Toman, Rackspace VP of Engineering on June 30, 2009 | Reply

Troy,

That sounds pretty complicated and I don’t envy the situation you guys were in. Do you mind sharing the generator vendor(s)? I’d like to put them on my no buy list. Thanks.

avatar Chris Drake on July 1, 2009 | Reply

Cheers Troy, Perhaps it would be an idea to look into a manual system where your techs can bring a set online for a series of UPS units, where the load is enough to supply those units from a single set (or pair with AMF).

avatar Alex Threlfall on July 1, 2009 | Reply

Chris Drake, I would imagine the size of set we’re talking about here would restrict who you would be able to buy from anyway! The likes of Cat Finning (who own FG Wilson here in the UK) and the “big boys” in the diesel world all tend to manufacture their own gensets, with a lot of the electronics being produced either in house, or by the likes of Deep Sea Electornics. ~5 years in the generator business helps ;)

avatar Alex Threlfall on July 1, 2009 | Reply

[…] that it impacted our customers’ businesses as well as the experience of many who use the web,” stated Rackspace CEO Lanham Napier in a blog post. “Although we have had some issues with this data center before, please know that we will do what […]

avatar Rackspace Expects To Return $2.5-$3.5 Million For Monday’s Downtime on July 3, 2009 | Reply

“Zero-Downtime Network” methinks it’s time to change the slogan to: “Almost-Zero-Downtime Network”

“Should a total utility power outage ever occur, all of our data centers’ power systems are designed to run uninterrupted,…”

Maybe it’s time to find new designers?

“…with every server receiving conditioned UPS (Uninterruptible Power Supply) power.”

Hint: http://www.pentadyne.com/site/our-products/technology.html

“Our UPS power subsystem is N+1 redundant,…”

Maybe N+2 for the future?

with instantaneous failover if the primary UPS fails…”

might want to change that to: “…with failover kicking in when we get around to it…”

“If an extended utility power outage occurs, our routinely tested, on-site diesel generators can run indefinitely.”

About those generators, http://www.cat.com/power-generation/integrated-systems
and

“See How We Make 100% Uptime a Reality” You might want to bury that one for a while.

“And it works so well that we guarantee it.”

Too bad that guarantee doesn’t include consequential damages. Especially to the small businesses out there that relied on your marketing slogans, especially in this current economy.

“100% Network Uptime Isn’t Wishful Thinking, It’s A Guarantee”

How about, “if it’s down, it ain’t our fault” and “Wait till you entrust us with cloud computing!”

avatar Bigbee on July 3, 2009 | Reply

How ironic that as I completed a survey this morning about the reliability of Rackspace, our site (and http://www.rackspace.com as well) failed again today. Albeit it brief, for what I can tell, this is another failure in a short period of time. You should have informed your clients about this not just through a blog but also through the ticket system even as it may not affect our site specifically, as it impacts overall performance and customer perception.

avatar Jan on July 7, 2009 | Reply

Really?! Again?

avatar mb on July 7, 2009 | Reply

I thought one of the “claims to fame” was that you guys had MULTIPLE fail over sites.

All I have seen is FAIL without the over.

How the in hello can you guys not meet something so simple as decent backup power?

Annoyance #2, Giving limited admins access to the “Submit trouble ticket”. Seems like a no brainer to me.

Annoyance #3, LOGS LOGS LOGS LOGS LOGS LOGS

Have an extra special day!

avatar Jim on July 7, 2009 | Reply

What kind of amateur power scheme are you guys running over there? 4 outages in 30 days! I can do better power redundancy running the servers out of my closet.

I was already looking for a new hosting provider since the outage last week. Apparently I wasn’t fast enough.

avatar thebouv on July 7, 2009 | Reply

[…] able to get things up and running fairly quickly, and more importantly, communicated well through its blog and Twitter throughout the […]

avatar Someone Needs To Stop Tripping Over The Power Cord At Rackspace on July 7, 2009 | Reply

[…] able to get things up and running fairly quickly, and more importantly, communicated well through its blog and Twitter throughout the […]

avatar Someone Needs To Stop Tripping Over The Power Cord At Rackspace on July 7, 2009 | Reply

[…] able to get things up and running fairly quickly, and more importantly, communicated well through its blog and Twitter throughout the […]

avatar Someone Needs To Stop Tripping Over The Power Cord At Rackspace | Viningmedia Nieuws on July 7, 2009 | Reply

I enjoyed the ‘random’ survey this morning. It was entertaining to fill out since it arrived a week after we started talking about new email hosting (because the downtime kind of killed us…for days).

I’m surprised we haven’t seen the level of communication that RS used with the last outage in Dallas. That was a professional job and the mass email by the CEO showed the spine behind the claim to fanatical support.

I’m thinking that going public now prevents him from addressing it in the same way. Bummer.

avatar Colin on July 7, 2009 | Reply

[…] able to get things up and running fairly quickly, and more importantly, communicated well through its blog and Twitter all through the […]

avatar Someone Needs To Stop Tripping by The capability Cord At Rackspace | Cellphone Ultra on July 7, 2009 | Reply

[…] Recently our friends at Rackspace have had some….outages/downtime/out-of-server-experiences. On a happy note the repeated downtime did not affect out primary servers. It did however whack our email service for days at a time. That’s pretty good in comparison to what other customers are saying on the Rackspace site here. […]

avatar What? | Bazamm! on July 7, 2009 | Reply

[…] able to get things up and running fairly quickly, and more importantly, communicated well through its blog and Twitter throughout the […]

avatar Someone Needs To Stop Tripping Over The Power Cord At Rackspace | on July 7, 2009 | Reply

[…] it has got some serious issues that it needs to address and keep its users happy. Rackspace went down once again and took a number of sites with it. Rackspace’s data center in Dallas once again […]

avatar Rackspace suffers another outage. Where’s reliability? | Startup Meme - Technology Startup and Latest Tech News on July 7, 2009 | Reply

[…] Rackspace (RAX), which several weeks ago suffered power problems that caused a temporary loss of service for customer Web sites hosted at its Dallas, Texas facility, this morning suffered another outage. […]

avatar Tech Trader Daily - Barron’s Online : Rackspace Suffers Another Power-Related Outage on July 7, 2009 | Reply

[…] (RAX), the Internet hosting company, was down for a few minutes earlier today because of “a brief power issue which cause a network connectivity […]

avatar Finance Geek » Rackspace Goes Down: Customers Scream, Investors Yawn on July 7, 2009 | Reply

[…] able to get things up and running fairly quickly, and more importantly, communicated well through its blog and Twitter throughout the […]

avatar Someone Needs To Stop Tripping Over The Power Cord At Rackspace | The Good NET Guide on July 7, 2009 | Reply

[…] able to get things up and running fairly quickly, and more importantly, communicated well through its blog and Twitter throughout the […]

avatar Someone Needs To Stop Tripping Over The Power Cord At Rackspace | Techdare on July 7, 2009 | Reply

Glad to see that you addressed this problem quickly and kept your users informed. Do have to say though that seems to be happening more frequently than it should.

avatar JimC4Stocks on July 7, 2009 | Reply

[…] able to get things up and running fairly quickly, and more importantly, communicated well through its blog and Twitter throughout the […]

avatar Tech Fused » Someone Needs To Stop Tripping Over The Power Cord At Rackspace on July 7, 2009 | Reply

[…] able to get things up and running fairly quickly, and more importantly, communicated well through its blog and Twitter throughout the […]

avatar The Far Edge » Blog Archive » Someone Needs To Stop Tripping Over The Power Cord At Rackspace on July 7, 2009 | Reply

[…] able to get things up and running fairly quickly, and more importantly, communicated well through its blog and Twitter throughout the […]

avatar Someone Needs To Stop Tripping Over The Power Cord At Rackspace | TopBlogs on July 7, 2009 | Reply

[…] able to get things up and running fairly quickly, and more importantly, communicated well through its blog and Twitter throughout the […]

avatar Get your News » Someone Needs To Stop Tripping Over The Power Cord At Rackspace on July 8, 2009 | Reply

[…] temporary loss of service for customer Web sites hosted at its Dallas, Texas facility, this morning suffered another outage.In a blog post, RAX said it suffered another power problem:Today at approximately 11:00 AM, an […]

avatar Rackspace Suffers Another Power-Related Outage on July 8, 2009 | Reply

The difference in reliability between RackSpace and another decent provider ( with much less expensive monthly cost) is not as much as one perceive.

This further proves that 100% (or 99.999%) uptime is impossible, no matter what the provider says, advertises or promises you.

avatar Son Nguyen on July 8, 2009 | Reply

Best emergency customer service and transparency I have ever seen, period. I took screen shots of the whole event including all the key tweets while my site was down. I will definitely include this in my social media workshop materials. Thank you!

avatar Mike Kujawski on July 9, 2009 | Reply
avatar Mosso : The Rackspace Cloud » Blog Archive » Dallas-Forth Worth Data Center Update on July 10, 2009 | Reply

[…] update on @Rackspace Dallas data center from CEO @lnapier – http://www.rackspace.com/blog/?p=334 […]

avatar Twigest for 2009-07-01 on July 22, 2009 | Reply

[…] – they tell the world. They Tweet the news. They communicate with their customers. In this amazing blog and video (scroll down and do watch) they outline a problem they had in one of their data centers. They […]

avatar Amazing Communication | Pronto Marketing on July 28, 2009 | Reply

After I initially commented I clicked the -Notify me when new feedback are added- checkbox and now every time a remark is added I get 4 emails with the identical comment. Is there any method you’ll be able to take away me from that service? Thanks!

avatar handyman brighton, mi on July 8, 2011 | Reply

Oh my goodness! an amazing article dude. Thank you However I am experiencing challenge with ur rss . Don’t know why Unable to subscribe to it. Is there anybody getting similar rss problem? Anyone who knows kindly respond. Thnkx

avatar Remodeling Howell, mi on July 8, 2011 | Reply

Any news abour DWF facility? Are they back in busines and safe now?

Thanks,
Anne

avatar Anne Jolki on September 9, 2011 | Reply

Wishing you all the very best in this courageous effort. The big boys like Citrix have flooded the enterprise market but I’m sure there are opportunities for SMEs. Good luck!

avatar Maureen Westray on April 2, 2012 | Reply

Best Colocatie Server service ever?

avatar colocation pricing los angeles on April 14, 2012 | Reply

Leave a New Comment

(Required)


Racker Powered
©2014 Rackspace, US Inc.