Support: 1-800-961-4454
Sales Chat
1-800-961-2888

Meet Cassandra

3

by Gary Dusbabek

Apache Cassandra is a fully distributed, highly scalable, sparse-table database.  It combines Dynamo’s fully distributed design and BigTable’s schema-free ColumnFamily-based data model. Client-tunable eventual consistency allows users to achieve a high degree of consistency while not sacrificing cluster availability or data redundancy.

Many users arrive at Cassandra after reaching the limits of what they can affordably accomplish with a traditional relational database (RDBMS). However, Cassandra is not a drop-in replacement for MySQL or Oracle.  It has some features that relational systems lack and is missing some features found relational systems. If you understand your application well and are willing to think about problems differently, Cassandra might be a tool worth exploring.

Distributed and Scalable Cassandra’s decentralized approach means every node in a Cassandra cluster is the same. Adding nodes to an existing cluster is relatively easy. Just make sure your storage settings are correct and then startup the new node. Cassandra takes care of deciding which ranges of data the node is responsible for and replicating the data to it. If you require more control, you are free to perform every step of this process yourself manually.

Schema-free Sparse Table If you are used to SQL tables, Cassandra’s data model is probably the biggest mental hurdle to overcome. One of the easiest ways to conceptualize the Cassandra data model is to imagine many rows, each row containing a list. You are free to add and remove items from these lists, or to ask Cassandra for the values from sections of these lists (we call them slices).

One of the ramifications of being “sparse” is that Cassandra has no notion of NULL—a key-value pair is simply present in a row or it is not. You are free, however, to store a column name with no value associated with it to indicate NULL (the absence of data) to your application.

Shedding Features Cassandra does away with several RDBMS features in the name of performance and scalability. Notably, you will have to do without robust transactions, ad-hoc queries, joins or flexible indexes.  These aren’t limitations though. Just ask some of the visible companies using Cassandra to build their applications. These include Facebook, Digg, Twitter and Reddit to name a few.

Why You Would Use It If your application has a very large dataset, high write throughput and requires distributing redundant copies of your data across servers, racks or datacenters, you should consider using Cassandra. Writes are fast because Cassandra’s write path has been optimized to avoid random disk accesses. Server-side caching enables reads to be fast as well, if you need it and have the RAM to spare. Cassandra has the ability for you to specify where your data is replicated, and how many nodes it should be replicated to. This makes Cassandra very fault-tolerant.

Rackspace Supports Cassandra Development Besides having our own internal uses for Cassandra, we at Rackspace believe it is important for you to have the ability to develop your application for Cassandra and deploy it anywhere, not just with a specific cloud provider. Rackspace is committed to an Open Cloud. We currently employ two programmers to work on Cassandra full-time along with several other part-time contributors.

There are three ways you can get more information on Cassandra:

1.  IRC. #cassandra for general questions, or #cassandra-dev if you are a programmer looking for answers relating to the codebase.

2.  Apache mailing lists. Those interested may subscribe to either a user- or developer-related list, or both.  Find out more by sending mail to user-subscribe@cassandra.apache.org and dev-subscribe@cassandra.apache.org.

3.  My email: gary.dusbabek@rackspace.com

About the Author

This is a post written and contributed by Cara Nichols.

Cara serves as Community Affairs Director for Rackspace Hosting and President of the Rackspace Foundation. She oversees Rackspace’s charitable giving and corps of volunteers, as well as manages the company’s non-profit and city relationships.


More
3 Comments

When will the cloud servers running Windows be promoted from BETA to production?

avatar Mark Germanos on July 14, 2010 | Reply

Hi Mark~
We are currently working on pushing Rackspace Cloud Servers for Windows beta into production and it will be here very soon. Check rackspacecloud.com/blog this Wednesday for an update with more details.
Thanks!

avatar Cara Nichols [Racker] on July 19, 2010 | Reply
avatar Cara Nichols [Racker] on July 27, 2010 | Reply

Leave a New Comment

(Required)


Racker Powered
©2014 Rackspace, US Inc.