In the last six months there has been a dramatic increase in interest for NoSQL and Big Data. You probably have heard “NoSQL is the future of databases,” or that “Big Data is a key technology that will allow businesses to get much smarter.”
Analysts are bold in their predictions. Gartner, for example, predicts that “Big Data will deliver transformational benefits to enterprises within two to five years, and by 2015 will enable enterprises adopting this technology to outperform competitors by 20% in every available financial metric.”
In the same report, Gartner places Big Data near the “Peak of Inflated Expectations” in the hype cycle, which can be defined as a phase that generates high amounts of enthusiasm and unrealistic expectations (i.e. what most people would call a buzzword). Given the current hype, it is useful to take a step back and understand where these technologies can be useful and try to distinguish hype from reality.
One aspect of the vision for Big Data is related to business intelligence applications, which seek to empower businesses and organizations to derive intelligence and insights that will enable them to act smarter, resulting in a significant competitive advantage. New forms of processing are needed to deal with the three core characteristics of Big Data (from Gartner’s own definition): high volume, high velocity and / or high variety of data.
Solutions such as Hadoop and NoSQL technologies facilitate storage and analysis of very large, unstructured data sets that have been challenging to manage with traditional SQL databases. While these technologies solve a significant part of the burden associated with business intelligence efforts, there are two key problems that still need to be addressed.
The first problem is that it is still highly complex to source and integrate enterprise data. Extracting, de-duplicating and correlating data about, say, customers and profitability, continue to be monumental tasks, particularly because they tend to involve a large number of source databases and information systems with potentially different definitions for the same piece of information.
The second problem is probably the harder one to solve because it goes beyond technology and into the skills available to the organization. Having access to an incredible amount of data and the ability to do complex queries are only part of the problem. To produce business value, one must derive insights from the data and be able to act on it. For example, marketers seldom act on the data available to them. In my experience, most marketers (with the possible exception of those companies that extract direct revenue from website visitors via online retail or advertising) rarely look at web analytics data and therefore fail to act on any insights that these tools may offer.
Regardless of the challenges, Big Data can be incredibly powerful when properly applied, but it will require expertise and skills that may not exist today in many enterprises (which is expected with any new technology). In addition, the tools used to visualize, query and summarize data will need to mature. Given the interest in this technology, I expect both of the challenges discussed above to be solved quickly by the industry.
What seems to be lacking is a deep understanding of the type of problems Big Data is designed to solve. Big Data or NoSQL technologies will not replace traditional databases that are designed to maintain relationships between structured data sets and to perform operations such as transaction processing that require the ACIDity provided by SQL (Atomicity, Consistency, Isolation and Durability of transactions). SQL databases will continue to be fundamental technology tools for many, many years.
From a market perspective, Microsoft SQL Server’s revenue is roughly $2.5 billion and grew by 20 percent in the last year. Meanwhile, the total revenue for NoSQL databases, which, according to The451, reached just $20 million in 2011. The451 expects the total NoSQL market to grow to $215 million by 2015, which is still less than half the growth in license revenue that SQL Server saw in 2011. I use this comparison only to highlight the sheer volume of problems that are still the sweet spot for enterprise mission critical applications backed by relational databases.
The main point is to set the right expectations. As it is usually the case, it is about selecting the right tool for the job, as GigaOM points out in the article “MongoDB or MySQL why not both?” Because NoSQL databases give organizations the advantages of scale and flexibility of data structures, they are a good tool for managing large amounts of data where the relationship between the data elements is less important.
As the Wikipedia article states: “NoSQL database management systems are useful when working with a huge quantity of data and the data’s nature does not require a relational model for the data structure.” To choose the right tool for your data problem, you should try to understand the business requirements across three dimensions of size, variety of the type of data (unstructured versus highly structured data) and velocity of ingestion and removal of data. In addition, NoSQL databases can often be deployed using commodity hardware, making it an affordable technology to deploy from a hardware requirements perspective.
I propose that there are three key aspects of NoSQL and Big Data technologies that we should remember:
Rackspace has been involved in NoSQL technololgies for quite some time (interestingly, a fellow Racker coined the term NoSQL). Our own IT department has deployed a NoSQL cluster on its own OpenStack private cloud to provide the business intelligence our management team needs (stay tuned for details).
At Rackspace we operate under a fundamental principle of openness, which means that we should support the technology choices of our customers. Whether you need MySQL as a service, SQL Server on dedicated or cloud infrastructure or a NoSQL cluster using technologies from our partners (such as Mongo or Infochimps), we aspire to offer you the right tool for your job.