Not all data is created equal. Each company’s data strategy needs to be laid out with consideration and thought as to what the future demands of the system might be. Although HadoopTM has its roots firmly planted in the JBOD and bare metal camps, users are increasingly trying to find ways to split up data processing based on workload requirements and on the nature of the type of query they are running.
Bare metal solves a majority of use cases and is an effective method if you have the ability to leverage commodity hardware and have a high internal core competency to run the various bits and pieces. The cloud brings in new advantages with a fully operational environment available in minutes that allows you to explore and configure Hadoop with very little risk.
However, there will always be some data that is best suited for processing in-house due to security, compliance or data integrity concerns. These data sets can be housed in Rackspace Private Cloud, which is our distribution of OpenStack configured to address very specific workloads and even trade workload resources during off peak hours. This effectively increases the level of flexibility of how you can consume IT resources for applications like Apache Hadoop.
Last year, we partnered with Enterprise Hadoop Distribution vendor Hortonworks to develop an on-demand Apache Hadoop environment available in minutes on the Rackspace Cloud. Along with that we orchestrated two additional offerings that spread the combined Apache Hadoop expertise to cover our entire portfolio of hosting options.
Now, we’ve built out a portfolio of data services aimed at adding value and expertise to the world of Apache Hadoop and to give you a choice to find which platform (or mix of platforms) is the best-fit choice for you data. We now offer a Big Data platform on dedicated servers, external storage and public and private clouds. Here’s what’s new:
Rackspace Managed Big Data Platform on Dedicated Servers
Available today, you can now provision fully operational dedicated servers with the HDP (Hortonworks) distribution. Complete with master services and data nodes fully configured, this platform aims to meet the requirements of the specific job. You can now leverage same level of Fanatical Support that you experience on your app and web servers in the data layer and receive deep expertise, architecture guidance and cluster management. You get the full set of HDP features (fully supported) including Pig, Hive, Flume, Sqoop, HBase and HCatalog. These tools help you interface with Hadoop in a language and operational fashion you are most comfortable with. See our small and large Hortonworks Data Platform reference architectures below.
Rackspace Managed Big Data Platformon External Storage
New use cases for Hadoop workloads frequently arise. They focus on exercises like user sentiment, machine learning and log analysis, just to name a few. Some of these workloads may require fewer compute resources but need a large volume of data. With traditional Hadoop, the server configuration dictates a ratio of resources you have available. The developers at EMC have teamed up with Hortonworks to offer the HDP distribution on the EMC Isilon device in a way that allows us to separate compute and storage resources and provide a level of redundancy and snapshotting not currently available in traditional bare metal hosting. This supported storage framework allows you to have data repositories or data warehouses that sit close to your HDFS data with the ability to cross pollinate and move data between these areas. We have diagramed the EMC Isilon solution below.
Rackspace Cloud Big Data Platform (Early Access)
Previously in customer preview, early access to our Cloud Big Data Platform is now available to Rackspace Cloud customers. This cloud-based Hadoop service allows you to spin up a fully configured and optimized Apache Hadoop cluster in minutes and use popular Hadoop tools like Pig and Hive with the added flexibility and ease of use the cloud offers. You can also process data that lives on Rackspace Cloud Files. In addition, this is all built on top of OpenStack, so you can deploy, process and build your Apache Hadoop cluster without fear of being locked in to a provider or proprietary platform. Your dataset may someday become a workload that shifts between on-prem and the cloud, all bridged by the world’s leading open source cloud platform. Established workloads can also be moved between the offerings all while leveraging the HDP platform.
Hortonworks Data Platform on Rackspace Private Cloud
The Rackspace Private Cloud is a leading OpenStack distribution aimed at delivering Fanatical Support anywhere. Since every set of data will have unique requirements and sensitivity, Rackspace Private Cloud allows you to virtualize your HDP environment and deploy multiple iterations and environments. This allows a unique set of access and controls to each user. You may also find that you can allocate compute resources in a way that fits your specific business. HDP on Rackspace Private Cloud offers the next level of customization and portability to your data environment. In addition, new users can leverage training provided direct from the RPC team starting in October 2013.