How to Write Code that Scales
The Internet continues to grow exponentially on a daily basis. This growing demand gives rise to the need to think carefully about how web scale applications are written so that they will work properly under challenging workloads. Fortunately, computing hardware is growing increasingly faster, and at a fraction of the price. Memory is abundant and CPUs are 64-bit. Additionally, both CPUs and bandwidth are inexpensive. While storage is more economical than ever, I/O capacity and high-speed network interconnects are not inexpensive.
As a direct result of these changes, the concepts for traditional software development are no longer applicable or appropriate. Database development must shift its approach given these changes in resource cost and availability. With these changes has come the generation of large amounts of data along with the brainstorming of groundbreaking methods for building interactive software solutions and data analysis systems. The playing field has changed tremendously and requires a shift in gears when it comes to thinking web-scale.
Five Major Concepts
There are five major concepts that should be considered when coding. Following an overview of these concepts will be useful suggestions to assist with mastering these concepts.
Concept #1: Maintain a Low Requirement of Resources
The need to keep resource requirements low is frequently overlooked today, largely due to fast CPUs, plentiful memory and speedy networks. Making software procedures more efficient is sometimes the simplest way to decrease scalability issues. Clearly, the quicker a user gets on and off a CPU, the less data transmitted and memory occupied. Following that thought, the application will also run faster. In order to decrease the processing time per unit of work, the following four resources must be considered:
- I/O – This is the first resource applications exhaust, particularly when massive amounts of data are being processed. The slowest is disk I/O, but bus I/O can also be exhausted.
- Network – Bandwidth is more reasonably priced than it’s ever been, but running out of bandwidth can happen quickly if the network is used extensively for accessing remote data.
- Memory – CPUs now have an abundance of cache on them as well as shared L2 caches between multiple cores. Additionally, memory is the fastest and most affordable it’s ever been.
- CPU – You can afford plenty of CPU, and more cores alongside more servers directly translates into a large impact on lowering the overall process time.
Concept #2: Run in Parallel to Lessen Bottlenecks
The fastest way to diminish bottlenecks is to break workloads into undersized pieces and then process those pieces in parallel on separate storage devices, networks, CPU cores or servers. Additionally, be sure to keep coordinated synchronization of the processing to a minimum because locks will kill currency.
Concept #3: Make Data Decentralized
Data has been traditionally stored centrally. This model requires the database to be scaled vertically, a process that becomes increasingly rigorous and costly for maintaining and expanding. Additionally, central databases can become I/O bottlenecks very rapidly while updating status changes within a central data design can be challenging. Keep in mind that distributing data over numerous servers, though, also distributes the write load. A distributed data store system, such as Cassandra, can assist with this task, handling the partitioning of data while making it very easy to add servers as capacity needs increase.
Concept #4: Eventual Consistency
If slightly stale data can be used on your application, then an eventually consistent data storage system can be employed. Asynchronous processes are operated on these types of systems to update remote replicas of the stored data. Immediately after an update, though, users might initially see the data’s stale version. Carefully determine what data needs ACID (Atomicity, Consistency, Isolation, Durability) properties. If BASE (Basically Available Soft-State Eventually Consistent) is adequate, then superior scalability and availability can be achieved from a distributed data storage system that employs asynchronous data replication.
Concept #5: Scale Horizontally, Rather than Vertically
Simply put, vertical scaling is the addition of resources (more memory, faster network interfaces) to a given computer system. Although this process can be simple and reasonably priced, the vertical limits can be reached very rapidly. So how do you increase if you’re already running the largest, fastest system you can afford? The solution is horizontal scaling. To do so, you need to add more servers and partition the work among them. If your system scales well horizontally, your needs may be perfectly met with multiple, slower systems. However, writing your software without a horizontal scaling capability can cause an impasse. Rather than getting snared into scaling vertically with no other options, horizontal scaling capabilities can lead to further scaling with significant cost savings.
- Concurrency! = Locks. Operating horizontally permits the processing of excess work concurrently. However, try to decrease the quantity of parallel work that is being synchronized. Excessive serialization for data synchronization can quickly result in low processing throughput and an uneconomical deployment of resources.
- A Thread per Connection is a Bad Thing. Minimizing the thread to CPU ratio is a good rule of thumb. Having more threads than cores will result in a decrease in the completion of work. I/O bound threads, however, sometimes present a situation where more threads than cores is acceptable. Just be sure that you aren’t trying to process too much data at once; this may result in stretching your I/O system too lean.
- Fixed Number of Workers in Thread Pool is a Good Thing. A thread pool is a wiser selection for that bullet point above. A thread pool can lead to an optimal number of workers dragging work off a queue, while substantially increasing throughput.
Nine Suggestions: How to Write Code that Scales
Suggestion #1: Define Worst Case Scenario Plan
What’s your worst-case scenario? Define and quantify what an acceptable result would be in a worst-case usage scenario. It may look something like, “Support 20,000 simultaneous connections with < 2 second response time.” Keep this number in mind at every stage of software design and implementation. You may even need to remind everyone on a regular basis because it’s easy to get sidetracked by the feature list and forget the architectural performance and scalability goals. If it’s in front of you at all times and you write a test plan before even writing a single line of code, you will be able to meet your defined scenario.
Suggestion #2: Ensure Caching Mechanisms
Determine which data is most often accessed, and have it cached into memory for the provision of repeated, high-speed access. In most instances, distributed memcached clusters are better than host-based caches.
Suggestion #3: Network Data Compression
Compressing and decompressing the data exchanged between clients and servers is often overlooked. Doing so is a great way to assist with the responsiveness of applications. By reducing the data transfer times, this in turn increases the capacity that can be handled per unit of time. The time cost incurred by this process is typically trivial when weighed against the benefit gained from the increase in speed. The overall efficiency using compressed transmissions, rather than uncompressed, is usually greater.
Suggestion #4: Disk Data Compression
Storage may be inexpensive, but I/O is not. You can readily and efficiently amplify the I/O throughput by compressing the stored data.
Suggestion #5: Sensory driven admission control system
When employing work queues, it’s worthwhile to think about a sensory driven admission control system. A common error is not placing limits on simultaneous usage over the network. For example, let’s consider a system with a satisfactory response time X. Within that response time is a maximum capacity of doing 50 things concurrently while producing 50 units of output. If you increase tasks to 51 things, output might decrease to 30. Taking it a step further, if you give the system 52 things to do, your output might sharply decrease to 20.
Illustrated by our example, pushing a system beyond its limits can cause it to spin its wheels without getting additional work completed. A recommendation is to queue all work, refusing work when the queue gets too long. Many of us are reluctant to have visible limits on system usage, but controlling the rate at which you accept work when operating near reasonable limits is much more preferable. Your client will receive a busy signal if you reject work; this is an improvement from getting “hung up” on! By using a sensible admission control system with an appropriate procedure, you can refuse traffic when work queues get too long. If your system scales well horizontally, though, it could be useful to use an API call to create more Cloud Servers to assist with servicing a growing work queue. This would potentially allow you to scale resources in order to track demand – avoiding entirely the need to refuse work. Keep in mind that this flexible provision is not a replacement for admission control.
At some point, demand may still exceed available capacity and you need a back up plan to handle the excess work. Still not convinced? Contemplate what might happen if your system had an infinite loop and the server mistakenly acted as a client to itself. Having an admission control could interrupt the loop before the whole system crashed.
Suggestion #6: Seek
Seeking is the cause of the majority of I/O bottlenecks. Steer clear of operations that will cause your disk(s) to seek when reading or writing data to a disk storage system. Whenever feasible, be sure to swap random I/O patterns with sequential ones.
Suggestion #7: Low Overhead per Connection
If you require significant amounts of memory per connection, not much work can be performed at the same time. For instance, 10GB of memory and a 200MB requirement per connection might only process 60 connections at a time. However, by switching to a 12-thread worker pool of 200MB each. If you decrease memory per connection down 2MB each, this might permit roughly 7,000+ concurrent connections. Essentially, you are having work completed at a comparable (and often faster) rate as when only 60 concurrent connections were supported. Using asynchronous I/O solutions like select() and epoll() for connection handling can drive even more efficiencies.
Suggestion #8: Only Save the Essential State
Suggestion #9: Avoid Use of Parsed Text
Try to limit the use of parsed text. This applies if components of your application communicate over the network, or perform larger amounts of communication between client and server. It is tempting to use XML or JSON data formats because you’re able to run different architectures in your network communication. Unfortunately the server-side resources used to parse the text data are CPU intensive and can significantly slow down processing times. Lightweight is key, and simple binary protocol can eliminate the need to use text parsing.
Servicing tens of millions of users makes simple system decisions difficult. However, designing systems that are horizontally scalable may result in a much more complex design. Before committing to scalability, consider what will be the true cost of efficiency. There are times that running a larger, quicker system is the real solution. Compression, encryption, thread pools and a host of other solutions are not tasks for a novice, and may be excessive for the demands of most applications.
Know your needs and keep the five concepts in mind when developing applications from the outset; this will ensure you implement what makes sense in your application. Depending on your requirements, you may only need to implement a few of the concepts to create an extremely scalable system.
© 2014 Rackspace US, Inc.
Except where otherwise noted, content on this site is licensed under a Creative Commons Attribution-NonCommercial-NoDerivs 3.0 Unported License
See license specifics and DISCLAIMER