Filed in by Pat Matthews | June 20, 2006 6:31 pm
To all of our customers who have experienced issues with our email hosting system in the last several days—I sincerely apologize.
June has been a bumpy month for us from an infrastructure perspective. I don’t want to make excuses; I just want to give you the facts and let you know what we’re doing to remedy the issues we’ve been facing.
Here is what has gone wrong:
1. We installed the latest version of the Red Hat operating system on our newest round of servers. That operating system installation has not performed up to par in our configuration. This has caused several mail queues, resulting in email delivery delays, for anywhere between 1-8% of our customers during peak hours of the business day. It has also caused several dropped IMAP connections which in turn may also log affected users out of webmail. Not good, I know.
2. The solution would naturally be to move users off of the servers with the bad OS installation, reinstall the older OS, and then move users back onto the better performing servers. But unfortunately we mis-timed the ordering of those new servers and just when we don’t need a delay in hardware delivery times, we’ve experienced pretty significant ones. When it rains, it pours.
Here is what we’re doing to fix the problems AND get ahead of the curve:
1. We’re putting a new round of new servers online tonight and another round later this week. We’ll be moving users off of the servers with the bad OS installations so that we can reinstall the OS and get them back to performing the way they should. We should see improvements beginning tomorrow but we may not be all the way out of the woods quite yet.
2. We’re putting together more efficient ordering processes with our infrastructure providers. We will not mis-time our orders in the future and any hardware delivery delays shouldn’t affect our customers because we’re not going to try to cut it so close anymore.
3. We’re putting together a more efficient server provisioning process. Right now it takes our engineering team hours to get a server online. In a couple of weeks it will take a matter of minutes—tops.
4. After we add the servers we need and get the servers with the bad OS reconfigured, we’re going to add an ADDITIONAL 50% capacity to our system—on top of what we actually need. And we’re not going to stop there. In fact, we’ve committed to adding at least 10 new servers to our server farm every month for the next 24 months. To put things in perspective, we have approximately 100 servers in our farm right now. Within the next month we’ll have 160 online and we’ll add at least 10 per month from there, or more, if our customer base grows faster than it is growing right now.
We’re pouring significant capital and human resources into the infrastructure hosting side of our business. We’re adding servers, investing in the best possible managed services around those servers, and internally, we’re innovating like crazy, doing our best to build the most intelligent, best performing infrastructure we can.
If you’re an affected customer, please do not take these past few days as an example of the quality of service we deliver. We aim to be the best. Sometimes we fail. And for that, I’m sorry.
Better days will be here soon.
Thank you for your patience,
Source URL: http://www.rackspace.com/blog/my-apologies-for-a-rough-few-days/
Copyright ©2014 The Official Rackspace Blog unless otherwise noted.