Hadoop Has Arrived: Hadoop Summit 2009

Filed in by Angela Bartels | June 24, 2009 4:01 pm

hadoop-logo
Jonathan Ellis, System Architect for The Rackspace Cloud here. Last week I flew to California to attend the Hadoop Summit [1]and NoSQL conference[2].

Hadoop [3]is the leading open-source project for MapReduce computation and supporting infrastructure (such as HDFS, the Hadoop Distributed FileSystem based on the GFS design). The 2008 Hadoop summit saw about 150 attendees; 2009 had literally five times that number. I am not a Hadoop expert but as a Cassandra developer[4], I’m interested in meeting people working with large datasets and there was no better place for that than the Hadoop summit.

Hadoop summit videos are not out yet, but should be soon. My favorite talks were the ones on Amazon Elastic MapReduce, Pig[5], and Hive[6]. (At The Rackspace Cloud, we compete with Amazon but I have to give them credit for their talk!) Pig and Hive are both projects that offer a higher-level language for writing MapReduce jobs, with slightly different approaches. We use Pig internally.

I should also mention that the first 500 people to register at the Hadoop summit were given a free copy of Hadoop: The Definitive Guide[7]. I would recommend this for anyone looking for an introduction to both using and administering Hadoop.

The NoSQL conference the next day featured an overview of a half-dozen of the most interesting open-source distributed databases, and CouchDB, which is targeting scaling down to mobile devices rather than out to hundreds of servers in your datacenter. NoSQL videos are up[8], and of course I have to point out the comment calling the Cassandra presentation (by Avinash Lakshman of Facebook) “hands-down the most interesting.” Besides ours, I would recommend Todd’s overview as well as the Voldemort and HBase talks. Yes, there are cases I would use one of those instead of Cassandra, but that’s a subject for another post! (In the meantime, Toby Negrin from Yahoo posted some notes on each[9].)

Want to know more about Hadoop at Rackspace? Be sure to check out this video interview from building43.com[10].

If you want to try out Hadoop or a distributed database but don’t have a cluster of your own, visit our Cloud Servers [11]page for more information.

Endnotes:
  1. Hadoop Summit : http://developer.yahoo.com/events/hadoopsummit09/
  2. NoSQL conference: http://nosql.eventbrite.com/
  3. Hadoop : http://hadoop.apache.org/
  4. Cassandra developer: http://www.rackspacecloud.com/blog/2009/05/a-key-to-cloud-standards-the-cloud-database/comment-page-1/
  5. Pig: http://hadoop.apache.org/pig/
  6. Hive: http://hadoop.apache.org/hive/
  7. Hadoop: The Definitive Guide: http://oreilly.com/catalog/9780596521974/
  8. NoSQL videos are up: http://blog.oskarsson.nu/2009/06/nosql-debrief.html
  9. some notes on each: http://developer.yahoo.net/blog/archives/2009/06/nosql_meetup.html
  10. building43.com: http://www.building43.com/videos/2009/06/18/think-your-dataset-is-large/
  11. Cloud Servers : http://www.rackspacecloud.com/cloud_hosting_products/servers

Source URL: http://www.rackspace.com/blog/hadoop-has-arrived-hadoop-summit-2009/