Our next generation Cloud Servers platform has a new control system based on OpenStack, but there are also changes to next generation instances as well. Specifically, vCPU allocations for some of our next generation Cloud Servers flavors have changed from first gen. This article discusses the changes and the reasons behind them.
The cloud is a pooled resource that many users leverage. This has potential pros and cons. One pro that doesn’t exist in a non-pooled world is the ability to “burst.” That is, as idle resources are available, they can be given to customers who need them, above and beyond their “fair share”. Pooled resources present an interesting question for cloud providers who need to decide how to allocate them. There are two basic models:
No oversubscription — Resources are divided in such a way that, when full, the amount of resources allocated to customers is the SAME as the amount of physical resources available. This option ensures consistency, but does not give customers access to idle capacity when available. In essence, it strands resources that could otherwise be used.
Oversubscription — Resources are divided in such a way that, when full, the amount of resources allocated to customers is MORE than the physical resources available. Customers are given more than their fair share, and as long as resources are idle, customers can “burst.” The downside to this approach is that it can provide an inconsistent experience. At the point where the resources used start to exceed the physical resources, those “burst” resources become unavailable and are throttled back. How often this happens and to what degree depends on how much the system is oversubscribed and how heavily it is used.
Consider seats on an airplane. If airlines didn’t oversubscribe flights, you would always be guaranteed a seat when you purchase one (consistency), but many flights would have empty seats (stranded capacity). Airlines generally do oversubscribe flights and sell more seats than are physically available on the airplane. This works as long as enough people miss their flights, but when enough people show up, people get bumped. Seat utilization is higher but the experience is inconsistent.
There is another very important point here. Airlines oversubscribe because THEY benefit. If seats are not filled, they lose revenue. As air travelers, we’d generally be fine if they didn’t oversubscribe seats since it only creates an inconvenience for us. But, what if CUSTOMERS benefited from the unused capacity instead of the airlines? For example, if the seat in front of you was empty, it could dynamically be reconfigured into business class. Or if the seats around you were free, they could instantly be reconfigured to first class with a bed. If oversubscription enabled that possibility over being stuck in a seat with empty capacity around you (assume the arm rests don’t go up to make the analogy more accurate), then you would probably want it. The downside is the potential of an inconsistent experience since travelers may eventually come to expect business or first class and be disappointed when they get less.
Let’s come back to CPU. For years, many low cost compute providers have oversubscribed their services. They do this to drive density and increase revenue, which (like the airlines) benefits them.
With Cloud Servers, we primarily carve up host hardware based on RAM and disk space which we do NOT oversubscribe (this is why our flavor names are based on RAM). These are what we’ll call “hard” resources as they are generally fixed and static. There are other “soft” resources like CPU cycles (as well as disk i/o and network i/o which won’t be discussed here) which are more variable and dynamic. The question is – how should soft resources be allocated? By not oversubscribing hard resources, we have already determined the density of a physical host server, so how soft resources are allocated doesn’t affect density or revenue. In other words, oversubscribing or not oversubscribing soft resources isn’t a decision that benefits Rackspace (as with the airlines), it’s one that benefits customers (or at least, has the potential to).
Average server CPU utilization has been shown to be low. If CPU is not oversubscribed, spare CPU cycles will go unutilized (just like empty seats on an airplane that can’t be dynamically reconfigured). As such, we choose to oversubscribe CPU in Cloud Servers. This allows cloud server instances to take advantage of idle CPU cycles on the host server. We believe this configuration maximizes the value and performance of pooled cloud resources.
The next question is, what is the right amount of CPU oversubscription? Too much, and the experience is inconsistent. Too little, and CPU cycles are stranded. Our goal is to find the sweet spot that maximizes price performance while still providing a consistent experience. More specifically, we want to give you as many vCPUs as possible while you still being able to consistently rely on full access to the corresponding physical cores underneath. Note that with even the smallest amount of oversubscription, there is always a theoretical risk of resources not being available, e.g. every VM on a full host concurrently runs a CPU intensive workload that contends for all CPU cycles of every allocated vCPU. While this is theoretically possible (albeit very unlikely), the goal is to strike the right balance so that in practice and at scale, this rarely, if ever, occurs. To use another analogy, while it’s theoretically possible for everyone to withdrawal all their money from the same bank at the same time (in which case the bank couldn’t pay out since they oversubscribe money), this rarely happens in practice and we can reliably withdraw money without fear of it not being there when we want it. In the same way, we want you to know that all allocated vCPU cycles are there for your cloud servers when you want them.
What we learned from running first generation Cloud Servers is the CPU oversubscription ratio we had was too high. We were optimizing too much for burst at the expense of consistency and while we might allocate 4 vCPUs to a cloud server, it wasn’t consistently getting access to all cycles of the 4 underlying physical cores. In next generation Cloud Servers, we have reduced the CPU oversubscription to a level we believe better hits the sweet spot. While the number of allocated vCPUs may be less, you can reliably count on access to the corresponding physical CPU resources. Rather than getting first class inconsistently, you will get business class consistently, when your fair share is coach.
What this translates to in product terms is that certain next generation flavors have different vCPU allocations than their first gen counterparts. Specifically, vCPU allocations have changed as follows:
|512MB Standard Instance||1|
|1GB Standard Instance||1|
|2GB Standard Instance||2|
|4GB Standard Instance||2|
|8GB Standard Instance||4|
|15GB Standard Instance||6|
|30GB Standard Instance||8|
Note: In some cases vCPUs are less, in some cases they are the same, and in some cases, they have increased.
The current available next generation flavors are “Standard” instances meaning they are intended for general purpose computing. We believe that a general purpose compute platform should be optimized for consistency, but recognize that other more specialized workloads (e.g. CPU intensive tasks like video transcoding) may not perform as well if the corresponding vCPU count has decreased. To better address these workloads, additional next generation Cloud Servers flavors are forthcoming which are designed and optimized for specific workloads. Until then, first generation Cloud Servers are still available and may be a better option for applications that benefit from more CPU burst.
© 2011-2013 Rackspace US, Inc.
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License