Rackspace and Drizzle: It's Time To Rethink Everything

Filed in by Adrian Otto | March 13, 2010 10:02 am

Rackspace has been a sponsor of the Drizzle[1] project for the past year. Recently we hired a team of five developers (Lee Bieber, Eric Day[2], Monty Taylor[3], Jay Pipes[4], Stewart S[5]m[6][7]ith[8]) from Sun Microsystems who have been working on this project. We think this project has a very bright future, and have put our money where our mouth is. SQL[9] databases still have a place in the cloud, and Drizzle is just the thing to fit that need. We believe in open source, and are willing to invest in it. Here’s a quick intro to the new members of our team:

Before I wow you with all the great things inside Drizzle, I’d like to remind you that we are in a changing world. We are generating more data than ever. In fact, the world is generating an order of magnitude more data than just 5 years ago. We are now in the world of 64-bit computing. Our CPU’s have 4 and 8 cores on them, and it’s not uncommon to see servers with 24 or more total CPU cores today. The way CPU technology is going we should expect the core count to continue to increase. We fit tons of RAM into servers today. It would be really nice to be able to put all this great new hardware to effective use. The database software widely used for storing, managing, and querying our data are based on realities that are simply no longer true. An adapted approach is required in order to keep up. For some, the solution is a distributed database like Cassandra [10]or one like it. For most, the solution will be something that can still be used with SQL, but is better suited for today’s data needs, and the server systems that will be here in the future, not just today.

The Drizzle project was started by Brian Aker from MySQL[11]. Brian served as Director of Architecture at MySQL, and after MySQL was acquired by Sun[12] he became a Distinguished Engineer. It’s hard to argue with the fact that Brian is one of the smartest people alive today that write open source database software. He keeps good company too. As I have grown to know the team from Sun I’m thrilled to have the honor to work with such a talented group. But it does not end there. Drizzle has a vibrant open source developer community, made up of over a hundred contributors worldwide.

In the very beginning, before Drizzle even had it’s name Brian’s idea was to start with MySQL 6.0 and fork it, taking out all the bloat like stored procedures, triggers, MyISAM, and numerous internal evils that made MySQL 5.1 too bloated for the needs of Web and Cloud use cases. In short, take MySQL and strip out all the fat to make a lean mean lightweight database that would really make sense for web applications and cloud computing. These plans evolved quickly from his unique perspective within MySQL not only about what works and does not work within the software, but how licensing and community development matter as well.

The Drizzle Vision

When it’s ready, Drizzle will be a modular system that’s aware of the infrastructure around it. It does, and will run well in hardware rich multi-core environments with design focused on maximum concurrency and performance. No attempt will be made to support 32-bit systems, obscure data types, language encodings or collations. The full power of C++ will be leveraged, and the system internals will be simple and easy to maintain. The system and its protocol are designed to be both scalable and high performance.

Keep the Good Stuff, Dump the Bad

When people first discover Drizzle one of the first things they want to know is how it’s different from MySQL. Some love MySQL and some hate it, and both sides have good reasons. MySQL is a mixed bag. Drizzle will re-think everything, keeping the best, and replacing or eliminating the worst[13]. For example, MySQL has a data type called a 3-byte integer. Think about that for a moment. On today’s server hardware that does not make a whole lot of sense. In Drizzle, it’s gone. Poof. Take another example where MySQL allows you to send two queries together separated by a semicolon. This is widely abused as a security weakness in a multitude of application software. In Drizzle, an error is returned when you attempt this. Poof. This improvement eliminates an entire category of simple security attack vectors. If I listed every single difference[14] this blog post would be way too long, so I’m going to highlight what I see as the best of the best:

1 – Micro-Kernel Design, with modular interfaces to plug in features and functions without touching the core code. API’s for Replication, Storage Engines, Logging,  Authentication, Client Protocols, etc.
2 – No triggers or stored procedures. That stuff is bloat as done in MySQL, and Drizzle has other ways to deal with these needs. These capabilities can be added in later as needed such that they are done right.
3 – Only UTF8 support, not a multitude of language encodings and collations. Keep it simple. This is the web.
4 – Way Less Source Code, where MySQL has well over a million lines of code, Drizzle is just under 300K lines[15].
5 – Drizzle Client Protocol, that’s pluggable and Asynchronous capable with built-in sharding support and built-in checksum support and BSD licence so you can package it in commercial software with no license drama.
6 – Default Storage Engine is InnoDB, for ACID[16] compliance but others can still be used. MyISAM is gone. Long live the Queen!
7 – Pluggable AAA, so integrating with your LDAP user database through PAM is simple as pie, if if you don’t want any auth (think memcached) just don’t load the plugin, and get a nice performance boost.
8 – Replication will be everything you ever wanted in a replication system. You hate MySQL replication? You now love Drizzle.
9 – Logging is pluggable so you can log to Syslog, a query analyzer, Gearman, or whatever you want to plug in.
10 – Query Rewriting is supported. If you have a misbehaving application, you can fix it at the database if desired.
11 – The Data Dictionary is redone so that no internal tables materialize, and there is only a single code execution path. This means all your base run faster.
12 – FRM Files are Gone. Yep, ever been snagged by the contents of an FRM file not matching what’s in the database? Cry no more.
13 – The MySQL Client Protocol is supported so you can use Drizzle with most applications without modification.
14 – It’s Easy to Build from Source. Ever tried to compile MySQL from source. Hah! Yeah, drizzle builds like butter.

So you should believe me by this point that it’s not just a little fork of MySQL with all the same bugs. It’s a new animal completely that’s re-using what MySQL did well, and eliminating or fixing where it was weak.

The Results

First of all, you’re wondering if Drizzle is production ready. It is very close. It’s time now to start working with it in preparation for full production status. If you trust your data to MySQL with InnoDB today, you can trust your data in Drizzle too. Try it in a test application and see how it’s different. You may notice that it can handle thousands of queries running concurrently. You may notice that it scales much better than a MySQL server over multiple CPU cores.

There’s much more to come as well. I’ll continue to let you know when new and exciting milestones are crossed. If you’re a developer, and want to participate, start here[17]. You can also find the team on the Drizzle IRC channel #drizzle on Freenode.





Endnotes:
  1. Drizzle: http://drizzle.org
  2. Eric Day: http://oddments.org/?p=282
  3. Monty Taylor: http://inaugust.com/post/77
  4. Jay Pipes: http://www.joinfu.com/2010/03/happiness-is-a-warm-cloud/
  5. Stewart S: http://www.flamingspork.com/blog/2010/03/11/continuing-the-journey/
  6. m: http://www.flamingspork.com/blog/2010/03/11/continuing-the-journey/
  7. [Image]: http://drizzle.org
  8. ith: http://www.flamingspork.com/blog/2010/03/11/continuing-the-journey/
  9. SQL: http://en.wikipedia.org/wiki/SQL
  10. Cassandra : http://www.rackspacecloud.com/blog/2009/09/23/the-cassandra-project/
  11. MySQL: http://www.mysql.com
  12. Sun: http://www.sun.com
  13. eliminating the worst: http://drizzle.org/wiki/MySQL_Differences#Objects_Removed
  14. every single difference: http://drizzle.org/wiki/MySQL_Differences
  15. just under 300K lines: http://drizzle.org/sloc/
  16. ACID: http://en.wikipedia.org/wiki/ACID
  17. start here: http://launchpad.net/drizzle

Source URL: http://www.rackspace.com/blog/rackspace-and-drizzle-its-time-to-rethink-everything/