At this year’s O’Reilly Strata event we will showcase our support of the newest genesis of the Hortonworks Data Platform (2.0), a release that we believe represents a paradigm shift in the perception of what you can use Hadoop to accomplish.
With the introduction of new tools and components, the batch nature of Hadoop is changing to become flexible enough to handle more interactive, online and streaming data flows. A new data operating system in Hadoop is shifting from being a single-use system to a multi-purpose platform. Combined with the query performance enhancements and updates to the Hadoop Distributed File System (HDFS), Hortonworks Data Platform version 2.0 ups the ante of what you can do with your data.
Here are some of the changes in HDP 2.0 our customers are excited about:
YARN is the newest data operating system for Hadoop. It is an acronym for “Yet Another Resource Negotiator” (cheeky we know!). YARN enables functionality beyond MapReduce to enable more streaming and interactive operation. YARN is a series of purpose built data models that execute in parallel to increase processing power of Hadoop using the exact same hardware. This means you get more predictable performance and resource utilization. In addition, new features in shared operational services allow for execution across multiple running workloads. The promise of YARN is that the focus will move off of the sometimes less optimal MapReduce operation so future versions of Hadoop can be even better equipped to meet the demands of the application. Early results indicate between a 60 percent and 150 percent performance improvement with the use of YARN. To learn more about YARN visit http://hortonworks.com/labs/yarn/.
HDFS 2.2 is the updated file system for Hadoop and it comes with some notable advancements to the system. There has been a focus on making a more efficient HA file system through the Zoo Keeper service. This service can help make sure that the file system is monitoring the health of the HA state throughout the entire environment. The snapshotting capability has also been improved to create point-in-time snapshots that were previously problematic in Hadoop 1.0. This will allow users to restore the state of the file system directly from a snapshot, creating a more seamless failover experience. To learn more about HDFS 2.2 visit http://hortonworks.com/hadoop/hdfs/.
Redundant Name Node capability in HDP 2.0 helps extend the fault tolerance capabilities of Hadoop to also cover the name node. Previously, a failure in the name node, while rare, would cause certain operations. The updates in HDP 2.0 allow for seamless replication between name node instances. We are excited about being able to manage this key element of congruent operation for you so you can rest assured that all components and operation are available.
Improved Hive performance with Stinger in HDP 2.0 is the second stage of the Stinger initiative aimed at accelerating the performance of Hive. Traditionally a batch processing application, Hive is the focus of a lot of development efforts around turning the application into more of a real-time engine. In order to do this successfully the query performance of Hive needed to be improved greatly and the early results are very promising. Hive, the de facto SQL access in Hadoop created by the team at Facebook, provides a SQL interface to data stored in HDFS. Hive was originally designed to deal with data at petabytes of scale, but fell short when dealing with smaller sets of data that needed super-fast query response. The latest upgrades to Hive have bridged the gap improving the query performance of Hive by over 60 percent with the same reliability at large scale. The new Hive promises the introduction of new workloads with more real-time requirements while reducing the overall resource requirements of your cluster.
All of these additions represent a new direction for the Hadoop application framework and open community involved in designing its future. What was once widely considered a data platform built for web-scale batch applications is now transforming into a multi-workload staple to the modern data architecture.
Rackspace has partnered with Hortonworks to provide the Hortonworks Data Platform on two offerings; a cloud based Hadoop-as-a-service offering called Rackspace Cloud Big Data, which is currently in limited availability, and a dedicated offering called Managed Big Data Platform which now supports the HDP 2.0 release.
Want to hear more? Rackspace will be on hand at Strata in Santa Clara, Calif. this week (February 11 through February 13). Stop by and see us at booth No. 625.