Support: 1-800-961-4454
1-800-961-2888

Cloud Big Data Platform Ready For More Hadoop Apps

A few weeks ago we told you about our two Data Services for Hadoop-based applications, the Managed Big Data Platform service (in Unlimited Availability) and the Cloud Big Data Platform (in Early Access). Working hand in hand with Hortonworks, we are giving you a choice of architectures for your Hadoop applications, whether you need a custom-built Hadoop architecture based on specific dedicated hardware, or a dynamic, API-driven programmable Hadoop cluster in our public cloud.

Today, we are pushing the ball further as we move our Cloud Big Data Platform from Early Access into Limited Availability.

WHAT HAVE YOU TOLD US?

The users of our Cloud Big Data Platform in Early Access come from a variety of industries and backgrounds, from online marketing and ecommerce working on recommendation engines and user sentiment analysis, to hospitality and retail looking at product performance, and to education and science working to improve people’s lives. We have learned a lot from you. These three statements summarize what we are hearing:

  • “Big data” technologies are better understood: while many of you were just “testing the water” in the past two years or so, today you are taking a really good look at how to utilize Hadoop in your applications. There is more clarity and understanding of the technology, its uses and limitations, as the tooling around Hadoop continues to evolve and mature.
  • “Big data” projects are more actionable:  we see more pragmatic approaches in implementations, with a focus on visible value. We see less and less abstract, ambiguous “big data discussions” with unclear goals and hyped expectations, and more practical uses of the technology for projects that drive real business value, as it should be.
  • Doing “big data” is still hard: From a technical perspective, we hear that these projects are still hard for you. There is still a lot of work that we as an industry need to do to make sure the technology is manageable, performant and scalable to make these initiatives less difficult to carry out. We are glad to be doing our part to help you with these projects.

FROM EARLY ACCESS TO LIMITED AVAILABILITY

Today, our Cloud Big Data Platform is moving from Early Access to Limited Availability. Limited Availability is the last phase we use here at Rackspace prior to the service being delivered in Unlimited Availability in a few months. We hope to get as many customers on the service as we can, but we will unfortunately not be able to accept everybody yet. However, let us know what you are working on and will definitely consider your application. The most important aspect for us is making sure we deliver on our promise of being Fanatical across the lifecycle of your initiative, which is why we have our Early Access, and now our Limited Availability program.

What does this mean to you beyond the new capabilities we have built in?

It means three things:

  • We are broadening the pool of customers we are accepting into our service.
  • We now offer the full Fanatical Support and Rackspace Cloud SLA for your workload to enable you to bring production applications to the service.
  • We are going to start billing you for the resources you consume. This is important for YOUR accounts payable team or personal credit card!

To sign up for Limited Availability, visit the Cloud Big Data Platform page at http://www.rackspace.com/cloud/big-data. Click on the “CONTACT US” button and fill out that simple contact form. A Racker will contact you to get you on board. Once we grant you access to the service, you will then simply provision a cluster right from Cloud Control Panel or using the API.

CHOICES OF NODES

You will have two options for your deployment depending on your data and compute requirements. See below. It is worth emphasizing here that the 1.3TB instances will share the compute node, but the 11TB instances are all single tenant.


Remember that Hadoop keeps three copies of your data by default. To account for your total storage footprint, remember to multiply by three the estimated source data volume you have, and then to decide how many nodes you will need to provision.

And yes, to repeat, the free period is over and you will start getting billed for the resources you consume.

WE ARE AVAILABLE IN LONDON!

Cloud Big Data Platform is now available in London!

The hourly and equivalent monthly prices will be as follows:

  • £0,27 per hour (£197,10 per month) for the 1.3TB instance
  • £2,16 per hour (£1.576,80 per month) for the 11TB instance

NEW CAPABILITIES FOR LIMITED AVAILABILITY

We have added a number of new capabilities to the Cloud Big Data Platform over the past few weeks to prepare it for Limited Availability. Here are some of them:

  • A Rackspace Cloud SLA for your production workload: Limited Availability environments are production-ready. The Rackspace Cloud SLA applies, which includes 99.9% instance availability per month (excluding scheduled maintenance) and 100% network availability (also excluding scheduled or emergency maintenance).
  • Fanatical Support: We have the Hadoop teams ready to help you in your design and deployment. As we said above, “doing big data” can be difficult, and our Hadoop Support teams are ready to help you be successful in your application.
  • Network Performance: This is significant. For example, in a simple network throughout test, our Early Access environment in DFW saw 480Mbps. We repeated that test in our Limited Availability environment and now see about 5.2Gbps in DFW, ORD and LON. That is an improvement of about 10 times.
  • More storage: We raised the limit of node storage to 11TB per node.
  • Single tenancy available: Our 11TB instances are single-tenant. Your application will have the whole node all to itself.
  • Cloud Files Connector: We have improved the integration between the service and our object storage in Cloud Files.

WHAT DID WE NOT GET TO FOR LIMITED AVAILABILITY?

One thing we have not made available in Limited Availability is Hortonworks Data Platform 2.0. In Limited Availability, we support Hortonworks Data Platform 1.3. Some of you are interested in the new goodies in newest codebase. Particularly, we hear that you want:

  • YARN to move beyond batch into online and interactive queries: You want to do more than MapReduce (batch) queries. We heard that you are exploring Giraph, Storm, HBase and Tez for your applications requiring interactive, online and streaming patterns.
  • Better SQL semantics: We heard that you are looking for improved SQL semantics in access to your queries, and are interested in exploring how HIVE is evolving through the Stinger initiative.
  • Higher Hadoop availability: through the improvements in the latest bits of HDFS, better handling of name node failures in Hadoop, snapshots and NFS, among others.

Rest assured that our engineers are hard at work to make this available in our Managed and Cloud Big Data Platform services. Expect more news from us as we learn more from you in this Limited Availability phase and work towards Unlimited Availability. We are making sure that there is an ample supply of storage and compute nodes available for your needs in our datacenters, we want to enhance the API, UI and overall programmability.

LET ME SEE MORE!

If you have one hour to spare, watch the video below entitled “Apache Hadoop on the Open Cloud” with Nirmal Ranganathan from Rackspace and Steve Loughran from Hortonworks. In it, they cover:

  • An overview of the service, and its OpenStack architecture
  • Using the Control Panel and API to provision and manage a cluster of Hadoop nodes
  • Processing location data from Wikipedia, off of Cloud Files object storage
  • Rendering the data on a simple Google Map

You can click on the video below.


To find out more, visit the developer documentation and the Getting Started Guide, or go directly to our Cloud Big Data Platform website.

As usual, let me know what kinds of cool apps you are working on @jrarredondo.

About the Author

This is a post written and contributed by J.R. Arredondo.

J.R. Arredondo joined Rackspace in February 2012 in the areas of Cloud storage and Cloud databases. J.R. came to Rackspace from Microsoft in the SharePoint and Office product management and engineering groups from 2006 to 2012. Before Microsoft, he was with BMC Software in its Corporate Development and Strategy group driving M&A initiatives in the areas of BSM and Application and Database Performance Management. Prior to BMC he led software engineering and architecture groups for eCommerce at Compaq.


Connect with J.R. on Twitter @jrarredondo or Google+.


More
Racker Powered
©2014 Rackspace, US Inc.