I was really intrigued by this article, mostly because I found myself disagreeing with many of its suggestions. I don’t disagree with having a source-agnostic approach to data, but I somewhat disagree with the premise that self-service tooling can drive instant business value.
Let me explain.
Collecting and sorting data from multiple sources more efficiency is an absolute necessity, and I firmly believe that companies should also invest heavily in data scientists and data engineers, as they can provide significant value and unlock deep insight into your business. The key is to use data scientists and data engineers in the right manner. This article puts forward a suggestions of a new cross-functional team comprising DevOps engineers, data scientists, data engineers and product developers. I believe this is too narrow.
Instead, I recommend following the Simon Wardley model on Pioneers, Settlers and Town Planners. Wardley makes the case that a different type of talent is needed as your product/service evolves from inception, through its development, and all the way to becoming a commodity. The key point is that the approach focuses on repeated long-term innovation, which is critical for any data project. Following this framework would support a structure that has clear focus and an agenda designed to enlighten, excite and drive data value within an organization. Seek out your pioneers, settlers and town planners, and you stand a chance of data being structured, unified and highly reliable.
Advancements in cloud technology and the proliferation of cloud migrations enables the growth of a “self-service” culture when it comes to Business Intelligence (BI), Artificial Intelligence (AI), and Machine Learning (ML) advanced analytics. Here is how intelligent data addresses the chasm in the cloud.
Self-service advanced analytics allows business users in an organization to directly access and analyze data with BI tools themselves. No longer will we have to rely on trained data engineers to source the data for BI tools nor data scientists to model and predict outcomes.
The rise of self-service analytics
Sourcing and retaining data engineers and data scientists or outsourcing data analysis is not only expensive but time-consuming as well. In an ever-accelerating information age, the companies most likely to succeed are the ones that not only glean the most profitable insights from their data but do it faster and more nimbly than their competitors.
Waiting for data engineers to prepare and collect data for a data scientist can be a slow process. It will require collaboration with the IT department, slowing things down even further. And the results may not be optimal if the data scientist doesn’t quite understand the business user’s needs, or what insights and correlations will best help them succeed.
For these reasons, vendors are increasingly developing self-service analytics products.
These solutions allow business users to become what’s colloquially known as “citizen data scientists.” The business user can directly access and utilizing data that’s aggregated from a range of sources, without requiring them to have a background in statistics or technology.
This growing self-service approach to data analysis enables less technical folks to do data science, without having to rely on a cumbersome end-to-end process that is stifled by reliance on data engineering.
Self-service analytics and citizen data scientists have fundamentally changed the landscape of data usage within the enterprise.
According to a 2018 Aberdeen survey examining B2B purchase behavior for BI analytics solutions, the top criteria ultimately driving enterprise BI analytics purchase decisions are slanted toward the business user above all other considerations.
We see the ease of use, ease of integration with current IT structure, the efficiency of deployment, connectivity to multiple types of data sources, and speed of data retrieval.
Aberdeen’s Senior Vice President and Principal Analyst, Mike Lock, predicts that:
“The companies poised for success are those that are focused on putting the [analytical] powers in the hands of citizen data scientists. This is where the real opportunity lies – the linkage between this sophisticated technology and the citizen data scientists walking the halls of our organizations.”
There is, however, one major hurdle standing in the way of most organizations adopting a self-service analytics culture: Unified data access.
Unified data and the limits of cloud data transformation
Imagine, hypothetically, that each department in your organization spoke a different language. Sales spoke English, Marketing spoke Spanish, Accounting spoke Mandarin, etc.
Retrieving and combining information that is vital to management would be difficult, to say the least. Separate translators would be required to interpret the information from each department so that everyone could read it. Some information would likely get lost in translation.
Nuances and idiosyncrasies inherent in one language may not be understood by another, confusing the meaning of certain things and inadvertently creating misinformation. Needless to say, this type of situation would negatively affect your ability to operate smoothly and plan effectively, while undoubtedly hurting your bottom line.
While this Tower-of-Babel company is only hypothetical and doesn’t actually exist, many companies today are in a similar situation with their data.
These companies have many different types of data in many different formats, located across multiple systems and servers. Some of it is in the cloud, some of it is in on-premises servers, and it is often in different formats and governed by different policies and security practices.
Unified data is a term describing the aggregate of this data.
All the disparate data from all sources across the entire organization, collected together in a single place, for a single view. In order to achieve unified data access, organizations typically undertake a process known as cloud data transformation.
Cloud data transformation makes all data in all formats and from all sources—both cloud-based and on-premises—readable and accessible. This process can make or break the success of an organization’s cloud migration. There are, however, a number of key challenges that make cloud data transformation a long, complicated, and often expensive endeavor.
The cloud is a uniquely different operating environment, with new integrations, pricing models, security controls and optimization tactics. Cloud platforms may require restructuring the enterprise’s data, requiring extensive ETL and data translation projects.
Interoperability with outside systems and BI tools may be limited. Vendor lock-in could be problematic; many vendors store data in proprietary formats, effectively chaining customers to one solution. Some siloed or on-premises data could be attached to older legacy systems that can’t be moved without re-engineering those systems.
Security and entitlements may be difficult to maintain when merging data from many silos that have different users and configurations and follow different compliance protocols.
Intelligent data virtualization as a bridge to cloud data transformation.
If unified data access is the goal, and cloud data transformation is the means to achieve that goal, how to do we alleviate the considerable challenges inherent in cloud data transformation? The most common solution is data virtualization, but it has some limits.
Data virtualization, as its name suggests, virtualizes all of your data, wherever it resides. This makes all of your data available for collection and analysis, without having to lift and shift it into the cloud. Data virtualization is designed to give unified data access no matter where your organization is in its cloud migration journey.
As you undergo cloud migration at your own pace and on your own terms, you can still gain the benefits of a shared data intellect that enables all the different branches of your company to act cohesively, making insight-driven decisions for a shared purpose, and cultivating a self-service analytics culture.
However, even when you employ data virtualization, you can still end up having a partial view of your data and fall short of unified data access. There are three main reasons for this.
- Many traditional data virtualization providers force customers to translate all the data they virtualize into a proprietary format before it can be read and understood TechTarget.What often happens is that all of the data gets reduced into a lowest-common-denominator state, so it can be integrated into a single place for a single view. But this transformation process can result in data getting skewed or lost in translation.Let’s say, for example, you have a dataset that is strictly dedicated to relational databases. You want the data to stay the way it is, but you also need that data to be accessible to various other departments in your organization, so they can combine it with their own data to gain various insights.However, translating that dataset into the vendor’s format may sacrifice some of the specialized functionality that’s being provided by the database in which it is located. Without the context and functionality of the original database, the dataset may be unreliable, and your organization could be making decisions based on data that is now faulty.
- Furthermore, many vendors’ proprietary data formats are not interoperable with other technologies. So, you end up with new silo problems and with continuous integration problems due to vendor lock-in.
- As your data evolves and grows, you may be bogged down by the increasing amount of data engineering required to manage disparate data sources to run fast queries.To solve this challenge, companies are using autonomous data engineering capabilities powered by machine learning to build acceleration structures for queries and to ensure speedy response times.
Intelligent data virtualization: The Rosetta Stone of data
To overcome the inherent faults in most data virtualization solutions, and to empower your organization with a true self-service analytics culture, you need a new, source-agnostic approach to data virtualization that can read and communicate with all data in all formats.
In essence, a Rosetta Stone of data. This novel approach is a higher-evolved form of data virtualization, known as intelligent data virtualization. It is source-agnostic, allows you to access and analyze your data with any BI tools you want, and creates zero additional security risks.
Intelligent data virtualization is completely agnostic about the format of the data source. That means your data doesn’t have to be replicated or transformed in any way. Rather than having to rely on complex and time-consuming data transformation or data movement methodologies, it stays where it is, and it gets virtually translated into a common business language that is presented to your business users.
Now, you have a shared data intellect that everyone can read. All the different branches of your company can not only access and analyze data for their own unique purposes, but also act cohesively, making insight-driven decisions for a shared purpose. It’s that simple.
Choose your own BI tools
Many companies have already invested a considerable amount of money in BI tools, and most enterprise-level companies use a number of different tools. One department might use Tableau, for example, while another prefers Microsoft Power BI or Excel.
The challenge is that each of these BI tools has its own query language which will result in different query results between different tools, bringing accuracy and reliability into doubt. With intelligent data virtualization, you can use any BI tool you want.
You don’t have to bend all users to a single standard for BI software. All of your data will be accessible and queries will return consistent answers, no matter which BI tool you choose to use. It’s up to you.
No additional security risks
Unlike most solutions designed to provide unified data access, intelligent data virtualization enables companies to leave data in place. That means all of the existing security solutions and policies governing your data remain in place as well.
While your data may be readable to all of your users and a multitude of different BI tools, your permissions and policies are not changed. Security and privacy information is preserved all the way to the individual user by tracking the data’s lineage and the user’s identity.
The user’s identity is also preserved and tracked, even when using shared data connections from a connection pool. When users are working with multiple databases that may have different security policies, policies are seamlessly merged, and global security and compliance policies are applied across all data.
Your data remains as safe as it is now under your own existing security policies and apparatus, and additional security measures are not needed.
The future won’t make an exception for you
In the not-so-distant past, computers were the domain of certified experts. There were only a limited number of them in a given company, and most employees worked with pens and paper or typewriters and had little to no understanding as to what those brainiac IT guys did with computers.
Today, virtually every employee in an office has their own work PC and broadband internet access. Computers are far more intuitive and user-friendly, and computer literacy is the norm, not the exception. Bringing value to an organization via this type of technology no longer requires dedicated specialists.
Today, analytics and business intelligence are still largely in the domain of data scientists, but just as personal computers evolved and laypersons learned to use them, BI tools are increasingly becoming usable for the average worker, or “citizen data scientist.”
Those organizations that embrace this trend and facilitate this transition in how data is leveraged and utilized will be better positioned to succeed in an age where data mastery determines who survives and who falls in a business landscape that is changing at a previously unimaginable pace.
Companies that make unified data access a major priority will best be able to create and foster this game-changing self-service analytics culture and transform their business users into citizen data scientists.
In order to achieve unified data access, no matter where you are in your cloud migration or what type of architecture you’re operating with—on-premises, public cloud, private cloud, or hybrid—intelligent data virtualization is the most efficient, worry-free, and cost-effective method to give you true unified data access.
Credentials: Much of this piece was first published on AtScale by David P. Mariani, a co-founder and VP of Technology.