Support: 1-800-961-4454
Sales Chat
1-800-961-2888

Should You Switch To NoSQL Too?

12

With Twitter going public with their plans to switch to Cassandra, a lot of people are asking if they should switch too.

Ian Eure from Digg (also switching to Cassandra) gave a great rule of thumb last week at PyCon: “if you’re deploying memcache on top of your database, you’re inventing your own ad-hoc, difficult to maintain NoSQL database,” and you should seriously consider using something explicitly designed for that instead.

While there are other reasons to use Cassandra, such as the less-rigid schema or the best-in-class support for replication across multiple datacenters, most people are using it because a single machine can’t handle their query volume.  This is the case for other distributed nosql databases as well, although there are other kinds that are also useful for some applications.

The price of scaling is that Cassandra provides poor support for ad-hoc queries, emphasizing denormalization instead.  For analytics, the upcoming 0.6 release (in beta now) offers Hadoop map/reduce integration, but for high volume, low-latency queries you will still need to design your app around denormalization.

So, NoSQL  systems are not drop-in replacements for a relational database.  It looks like Twitter has been working on moving to Cassandra at least since Evan’s post in July last year; your application may be easier or harder to port, but it’s a useful data point to keep in mind.

For more on why the difficulty of scaling relational databases is driving developers to Cassandra, see the video or slides from my talk last week at PyCon, and for an introduction to Cassandra itself, see Eric Evans’ talk at FOSDEM, also available with video or slides.

Related Posts: The Cassandra Project, How Do You Put Database In The Cloud?NoSQL Ecosystem

Tags:

About the Author

This is a post written and contributed by Jonathan Ellis.


More
12 Comments

I agree with Ian that people mis-characterize their existing MySQL+Memcache architectures, but calling them “Eventually consistent” is wrong. They have no guarantees of converging on consistency.

At Twitter we call such a system “Potentially Consistent”. :)

avatar Ryan King on February 25, 2010 | Reply

Using memcached to cache the results of SQL queries is a largely solved problem. Throwing out your SQL database based on the “well we are using memcached” “rule of thumb” means you lose the ability to populate your cache with SQL-based results, and also means your entire datamodel has to throw out ACID. You might need ACID, and you might need caching of query results. Throwing memcached on top of that by no means means you’re reinventing Cassandra.

avatar mike bayer on February 25, 2010 | Reply

NoSQL does not mean that you automatically lose ACID.

avatar Jan Lehnardt on February 25, 2010

Hi Jonathan!

As you probably know ( ;) ), the Drizzle folks actually see NoSQL solutions as a great partner technology. SQL and NoSQL needn’t compete. They solve different problems, and the truly innovative projects will recognize this fact and work closely together so as to integrate as seamlessly as possible and get the best of both worlds.

Here’s to working together to solve the world’s problems! :)

-jay

avatar Jay Pipes on February 26, 2010 | Reply

[…] Should you switch to NoSQL too ? […]

avatar Scalability links for Feb 28th 2010 | Scalable web architectures on February 28, 2010 | Reply

@Jay, yes, we’ll have to come up with a new category name. And NoSQL was so catchy, too. :)

avatar Jonathan Ellis on March 1, 2010 | Reply

What keeps me from using Cassandra right now is the lack of atomic updates. There’s no way to do a reliable counter, unless you just save all your increments as separate items and sum them up when you read. The guys at Digg are helping to work toward fixing that.

avatar Jeo on March 7, 2010 | Reply

Another good NoSQL approach is http://www.mongodb.org/

avatar Randy on March 17, 2010 | Reply

“…Cassandra provides poor support for ad-hoc queries, emphasizing denormalization instead.”

This suggests denormalized data is the cause for poor support of ad-hoc queries. Denormalization is what is dine in a typical data warehouse implementation to facilitate easier ad-hoc queries.

By removing the need to join tons of tables a subject area can be represented in a few dimensions and one fact or more fact tables. This makes it easier to write queries since you don’t need to understand the normalized structure to get what you want.

Maybe there is a better term to use?

avatar Steve on March 17, 2010 | Reply

See this, esp. the video, to judge the advantages of NoSQL for yourself… and to get a good laugh :-)

http://buytaert.net/nosql-and-sql

avatar Vacilando on March 17, 2010 | Reply

[…] mysql + memcached era. If not quite yet the end, then the beginning of it. As Ian Eure from Digg said, "If you’re deploying memcache on top of your database, you’re inventing your own […]

avatar Cassandra ライブ情報がテンコ盛り – Jonathan Ellis @ Rackspace [ #cassandra #nosql ] « Agile Cat — Azure & Hadoop — Talking Book on March 25, 2010 | Reply

Should You Switch To NoSQL Too?…

Ian Eure from Digg (also switching to Cassandra) gave a great rule of thumb last week at PyCon: “if you’re deploying memcache on top of your database, you’re inventing your own ad-hoc, difficult to maintain NoSQL database,” and you should seriously con…

avatar ehcache.net on March 12, 2011 | Reply

Leave a New Comment

(Required)


Racker Powered
©2014 Rackspace, US Inc.