Support: 1-800-961-4454
Sales Chat
1-800-961-2888

Writing Code that Scales

11

By Adrian Otto, System Architect

The web is huge, and it’s getting bigger every single day. If you’re writing a web scale application that will reach millions of end users, you may need to think carefully about how you write that application so that it will work properly under the demanding workloads the web can produce. Our computing hardware is getting progressively faster and cheaper. This gradual evolution has changed so much, that traditional software development concepts are no longer appropriate for today’s web scale application environments. Memory is plentiful, CPU’s are 64 bit, bandwidth is cheap, and so are CPU’s. Storage is cheaper than ever, but I/O capacity and high-speed network interconnects are not. Considering all of this it’s no wonder that we are generating more data than ever before, and thinking of new and exciting ways to build amazing interactive software solutions, and data analysis systems. To think web-scale today, you need to change gears from what used to make sense years ago.

In this article, I want to highlight 5 concepts to think about when coding, and some tips to help you fulfill these concepts.

Concepts


Keep Resource Requirements Low

In today’s world of fast CPU’s, plentiful memory, and fast networks, this gets frequently overlooked. Sometimes the simplest way to reduce scalability problems is to make the software procedures more efficient. The faster you can get on and off the CPU, the less data you transmit, and the less memory you occupy, chances are the faster your application will run. The lower the processing time per unit of work the better. Four key resources to consider:

•   I/O – Most applications run out of this first, especially when they deal with large volumes of data. Disk I/O is the slowest, but bus I/O can also be exhausted.
•    Network – Although bandwidth is more affordable than ever, it’s still easy to run out of bandwidth if you rely heavily on the network for remote data access.
•    Memory – It’s cheaper and faster than ever. CPU’s now have lots of cache on them, and shared L2 caches between multiple cores!
•    CPU – You can afford lots of this, more cores and more servers helps a ton.

Mitigate Bottlenecks by Running in Parallel

Divide up your workloads into small pieces, and arrange for them to be processed in parallel on separate CPU cores, storage devices, networks, or even separate servers. Keep coordinated synchronization of the processing to a minimum. Locks kill concurrency.

Decentralize Data

Traditionally, software used a simple central data storage model. Scaling this requires vertical scalability of the database, which becomes increasingly expensive and difficult. Central databases quickly become I/O bottlenecks. Frequently updating status changes in a central data design can be particularly problematic. If you distribute your data over a number of different servers, and put part of your data into each server, you can spread the write load over numerous systems. Using a distributed data store system like Cassandra can help tremendously for this, taking care of all the partitioning of your data for you and making it very easy to add servers in order to increase your capacity.

Eventual Consistency

If you have an application that can use data that is slightly stale then it can use an eventually consistent data storage system. These systems use asynchronous processes to update remote replicas of the stored data. In some cases some users may see slightly stale versions of the data immediately following an update. Decide carefully what data needs ACID properties (Atomicity, Consistency, Isolation, Durability). If BASE (Basically Available Soft-State Eventually Consistent) is sufficient, then you can enjoy superior scalability and availability from a distributed data storage system that utilizes asynchronous replication of the data.

Horizontal Scalability is preferred over Vertical Scalability

Adding resources to a given computer system (more memory, more or faster CPU’s, faster network interfaces) is known as vertical scaling. Although vertical scaling is relatively easy and affordable, you can quickly reach its limits. What happens when you are running the biggest fastest server you can afford? Well, then you need to scale horizontally. This means adding more servers, and dividing the work among them. With a system that scales well horizontally you may be perfectly happy with a multitude of slower systems rather than a single fast server. Writing your software without a plan for horizontal scalability can get you trapped such that your only option is to throw more hardware at it to vertically scale up. If you have a problem that lends itself to horizontal division, you can scale out much further with significant cost savings.

•    Concurrency! = Locks. Working horizontally allows you to get lots of work done simultaneously. Try to minimize how much parallel work is synchronized. Too much serialization for data synchronization leads to low concurrency and inefficient use of resources.
•    Thread per Connection = Bad. You usually don’t want to have many more than threads than you have CPU cores. If you’re CPU bound and have more threads than cores, you will get much less work done. If your threads are I/O bound it may be acceptable to have a lot more theads than cores, but don’t spread your I/O system too thin trying to do too much at once either.
•    Thread Pool with Fixed Number of Workers = Good. It’s much smarter to have a thread pool. You can then have an optimal number of workers dragging work off a queue, and substantially increase throughput.

Helpful Tips for Writing Code that Scales


•    Write “stress” test plan first. Lay out your worst-case scenario. Write it down and post it on your wall to remind you. For example: “Support 10,000 concurrent connections with < 1 second response time.” Be sure to quantify exactly what an acceptable result would be under your worst case usage scenario. Keep that in mind at every stage of your software design and implementation. Remind everyone regularly. It’s very easy to get sidetracked and distracted by your feature list and forget your architectural performance and scalability goals. Put it right there in front of you in black and white. Write a test plan in advance before you write a single line of code. It works!

•    Cache baby cache. Figure out what data you access frequently, and cache it in memory for repeated high speed access to it. Distributed memory caches like memcached clusters are superior to host based caches for most use cases.

•    Compress the data you send over the network. Compression and decompression of data between clients and servers is frequently overlooked as a way to make interactive applications more responsive. It decreases data transfer times, and increases your connection handling capacity per unit of time. The CPU time cost of the compression and decompression is usually trivial compared to the speed benefit you get from it. The overall efficiency of a system using compressed network transmissions is almost always higher than sending data uncompressed.

•    Compress the data you store on disk. Try it. Really. Yes, storage is cheap, but I/O is not. By compressing your stored data, you can easily and effectively increase your I/O throughput.

•    Consider a sensory driven admission control system when using work queues. A mistake I see over and over again is having a system that puts no limits on its concurrent usage over the network. Let’s say for example you have a system with a maximum capacity of doing 100 things at a time and can produce 100 units of output within an acceptable response time T. If you give it 101 things to do, your output within time T might decrease to 50. If you give it 102 things to do, your output within time T might decrease to 40, etc. Pushing a system beyond its limits can cause it to grind on itself without getting practically any work done. I recommend queuing work and refusing work when your queue gets too long to process. People are reluctant to have visible limits, for rather obvious reasons, but in reality, it’s way better for performance if you control the rate at which you accept new work when you are operating near your practical limits. If you reject work, perhaps your client will receive a visible error message or busy signal. Think about this carefully. Is it better to get a busy signal, or have your phone call mysteriously hang up mid-sentence or sound horrible? That’s right, a busy signal is better. Somehow software developers seem to make no efforts to put busy signal capability into their networked systems. Using a sensible admission control system with an appropriate rejection procedure when queues get too long solves this. If you have a system that scales well horizontally, you could use an API call to provision more Cloud Servers to help you service the queue when the queue length gets too long. This way you can potentially scale your resources to track your demand and avoid the need to reject work at all. This elastic resource provisioning approach is no substitute for admission control, because demand might still exceed available capacity at some point, especially during error conditions. Consider what happens if your system had an infinite loop on itself where the server acted as a client to itself by mistake. Having admission control could break that loop before it took your whole system down.

•    Design around seek. Most IO bottlenecks are caused by seeking. When reading or writing data to a disk storage system, avoid operations that will cause the disk(s) to seek. Replace random I/O patterns with sequential ones where possible.

•    Keep per connection overhead low. If you need a ton of memory per connection, you won’t get much work done concurrently. If you have say 8 GB of memory and a 100 MB memory requirement per connection, you can run about 80 connections at a time. If you switch to a 10-thread worker pool of 100MB each, and reduce your memory per connection down to 1MB each, you can probably allow 7000+ connections at a time, and get work done at the same rate or faster compared to when you could only support 80 concurrent connections.

•    Don’t save more state than you really need. It’s common for web application developers to save data in server-side sessions between HTTP requests. This practice does not scale well. It can dramatically increase the per-connection memory overhead. Saving the state in the client browser rather than in the web application (use cookies, not sessions) can help solve this problem. You can encrypt it if you have trust concerns. Only read the values into the server-side application that you _really_ need.

•    Chatty applications should avoid parsed text protocols using data in XML. If components of your application communicate with each other over the network, or you do lots of communication between the client and server, try to keep the use of text parsed protocols to a minimum. It’s very tempting to use XML or JSON types of text data formats in your network communications because you’re able to run different architectures, etc.  It turns out that the server-side resources needed to parse the text formatted data is frequently very CPU intensive, and slows down connection processing times substantially. Keep it lightweight if possible. Consider using a simple binary protocol so that text parsing is not required.

I like to keep system designs as simple as possible. Servicing millions of users makes this hard. From my experience, designing horizontally scalable systems comes at the cost of complexity. Consider carefully what the cost of scalability and efficiency really is before you commit to it. Sometimes just running something on a bigger faster server does the trick, and that’s easy. Adding compression, encryption, admission control, thread pools, eventual consistency, decentralized data, sequential I/O, low per-connection memory overhead, binary network protocols, and distributed caching is not for amateurs. Doing all of that is probably overkill for what most applications need. Simply keep these concepts in mind when you develop your applications so you can implement what makes sense in your application. Depending on your requirements you may be able to make an amazingly scalable system with only a few of these things.

We are working every day to build software and services that make deploying scalable applications easier and easier.

This concludes my article on “Writing Code that Scales.” It’s time to think differently about how you design and implement your software so that you end up with an efficient scalable system when it’s all done.

About the Author

This is a post written and contributed by Adrian Otto.

Adrian serves as a Principal Architect for Rackspace, focusing on cloud services. He cares deeply about the future of cloud technology, and important projects like OpenStack. He also is a key contributor for and serves on the editing team for OASIS CAMP, a draft standard for application lifecycle management. He comes from an engineering and software development background, and has made a successful career as a serial entrepreneur.


More
11 Comments

Great article! Lots of stuff to chew on.

I would state one thing that is maybe outside of this article though. I think many software designers focus on these scalability issues too soon. While it’s great to have some of these features early on, many websites never grow to point to where they need them and waste far to much time thinking about “what ifs”.

Great to implement these techniques once you start to see traffic growing to the point where you need them and not at the onset.

avatar Derek on November 18, 2009 | Reply

Derek,

Well said. Implementing everything I mentioned above could extend your development time considerably. It’s best to keep them all in mind, and use them where appropriate. Some of them you should probably do out of the gate (like avoiding gobs of memory used by server-side Sessions). But other things could definitely wait until they are justified.

Deciding to go with an XML-RPC design in a highly chatty application could be the kiss of death for your long term scalability, so think about that carefully in the early stages. Some things are hard to change later after you build complete systems around them. Try to use some modularity so you have a simplified upgrade path to the ideal solution when you intentionally take shortcuts.

Cheers,

Adrian

avatar Adrian Otto [Racker] on November 18, 2009

Yes you make a good point that I didn’t extend on… knowing what the scalability issues might be while designing the software, will most definitely save time down the line, even if you don’t implement it. I also agree the Session issue is often over looked by junior developers and has huge implications down the road. Thanks again for your article, it was very informative. Your twitter account looks to be very useful as well, consider me a follower!

avatar Derek on November 18, 2009

Thanks Derek. For those interested here are my Blog and Twitter links.

http://adrianotto.com
http://twitter.com/adrian_otto

avatar Adrian Otto [Racker] on November 18, 2009 | Reply

Great post, wrt to Queues, you might want to link to “The Goal” which has been the best explanation of Queues for me in a long time. It’s based on auto industry, but just thinking through the book has made my work life better, along with how I think through job processing in programs.

It’s interesting to see the recommendation of going back to storing things in a cookie over sessions, should be interesting to see how many people can follow that. People spent a lot of time getting developers not to rely on the browser, and now we’re saying rely on it.

avatar Vid Luther on November 19, 2009 | Reply

The Cookie subject can be controversial if you care about browsers that are 10+ years old and have no cookie support. So much of the web relies on cookies now that everyone has them turned on. It’s important not to trust the data that comes from a browser cookie, as that can be tampered with. A simple workaround for that is to store the value encrypted with an embedded checksum so you can verify it when you receive it from the browser again. This can protect you from tampering. You’ll also need to sanity check the data after you decrypt it. Sounds like this is a subject for another blog post!

avatar Adrian Otto [Racker] on November 19, 2009

[…] Rackspace Cloud Computing & Hosting |  Writing Code that Scales – The web is huge, and it’s getting bigger every single day. If you’re writing a web scale application that will reach millions of end users, you may need to think carefully about how you write that application so that it will work properly under the demanding workloads the web can produce. Our computing hardware is getting progressively faster and cheaper. […]

avatar BotchagalupeMarks for November 19th - 12:20 | IT Management and Cloud Blog on November 19, 2009 | Reply

Use Lua instead of Python (not to mention Ruby).

avatar Phoenix Sol on November 19, 2009 | Reply

Good post! Now is there any chance of Rackspace Cloud Sites ever getting memcached? ;)

avatar Shan on November 24, 2009 | Reply

Shan,

Thanks for your comment. At the moment, Cloud Sites has client library support for memcached, but does not offer a place to run the memcached server. That’s what the article was about. You can run the server in Cloud Servers and connect to it from Cloud Sites. There are two key reasons why right now you can’t simply use memcached directly on Cloud Sites today:

1) The current stable release of memcached does not have any user authentication features. There are development versions that have this feature, so this problem will go away soon.

2) Cloud Sites is intended for interpreted code, not long running processes. In order to run memcached, you need to be allowed to run processes that don’t quit after your HTTP request finishes.

So for those of you that are looking for a super easy way to use memcached from cloud sites (no CLI tricks) you will have a way in the future when memcached is added as one of the included features of Cloud Sites as part of the base platform. That solution is still a way down the road because of a few other products that will launch before it, but keep your eye open on The Rackspace Cloud Blog for announcements that will come out when that service is ready to be launched.

Cheers,

Adrian

avatar Adrian Otto [Racker] on November 24, 2009

[…] Shared Writing Code that Scales. […]

avatar Elsewhere, on November 28th - Once a nomad, always a nomad on November 28, 2009 | Reply

Leave a New Comment

(Required)


Racker Powered
©2014 Rackspace, US Inc.