Contributors: Eduardo Coccaro, Jared Jacobson
Data is an engine. It can power application efficiencies, generate customer insights, drive business decisions and accelerate product innovation. And when properly managed, data is the foundation of successful automation, machine learning and artificial intelligence (AI) efforts. Eight out of ten surveyed executives believe that they are in a race with their competitors to extract value from data. As a Chief Data Officer (CDO), I’m one of them.
Though data can hold a wealth of potential, it can also be your arch nemesis. Unstructured, siloed and mismanaged datastores with no unified strategy are not only worthless, but also toxic. Arguably, the worst side effect of dirty data is the trust factor – or a lack of it. If my stakeholders can’t trust the data to make big decisions, what’s the point of collecting it, storing it and securing it?
The impact of dirty data
Dirty data can lead to bad decisions, costly miscalculations and negative impacts on any data-driven project. One-third of IT professionals point to poor data quality as the main reason that many AI and machine learning projects take longer than planned, cost more than expected and fail to provide useful results.
As companies start leveraging internal data, most find that their data isn’t as clean or reliable as they’d hoped. Turning the focus only on how to exploit existing data skips the prerequisite work required to collect and maintain clean data. In the past, I’ve seen businesses go all-in on data collection, looking for a quick analytic payback. But when “garbage in” delivers “garbage out,” they’re let down by the quality of the insights generated because there was never a focus on quality at the outset of data gathering and management.
Gartner reports that dirty data wastes around $15 million per year. That waste includes the resources – servers, applications and expertise – dedicated to maintaining data that isn’t even usable. I might have TBs of data at my fingertips, but that necessarily doesn’t mean that it’s the right data to drive the right insights for my business.
Add to all of this the onslaught of new data sources, types and risks. The majority of data points sitting on servers today were created in the last five years from the ever-expanding access we have to data from social media, consumer information, mobile devices and IoT sensors. Not only is there more data, but it’s also coming in faster with the help of 5G and edge computing technologies. Storing and securing this massive amount of data, of which only a percentage is actually valuable, can cost a lot of money while putting you in the crosshairs of compliance regulators for mishandling data or make you a victim to savvy hackers.
What is trusted data?
It is for all of the reasons above that my job, Chief Data Officer, has become more prevalent in the industry. Assigning a leader to own and govern all of the data within a company is essential to creating accurate data that can be trusted.
Most companies don’t know where to start when it comes to sifting through the data they’ve amassed. Common challenges center on identifying and cleaning high-impact data, and building stakeholder trust in that data. My goal as a CDO is to ensure that all of my organization’s data meets the following criteria:
- Data must be secure
- Data must be defined
- Data must meet quality standards
The definitions for each criterion will differ from one organization to the next. I use these general definitions to meet those targets:
Secure data is protected by encryption and other security measures. Additionally, it should adhere to privacy, compliance and other industry-specific regulations.
Though creating common, standard company definitions for data points is difficult, it is a key element of establishing trust. Align internally on terminology so that there is a consensus on which metrics to measure and the definition around those metrics. You can have multiple definitions for the same type or source for a metric, but you have to define it differently to ensure clarity.
The most variable element of trusted data is quality. A retailer might be satisfied with 95% accuracy on store traffic, while a medical center may demand 100% accuracy. When I sit down with stakeholders, I make a point of establishing the level of data quality that’s required while setting expectations of what it will take to get there.
Addressing your stakeholders
Trust is an element of processes, but it’s also driven by the stakeholder audiences who use it. There are three primary stakeholder audiences:
One of the newest, but most impactful audiences, is that of super users. These data advocates are analytical by nature and want to be empowered to wrangle data to drive change. Data super users want to be fairly autonomous and able to take a cognitive approach to problem solving. Instead of waiting for a report, they expect easy access to dashboards and data visualization tools.
These new super users represent a small portion of users, but they aren’t just in the C-suite anymore. Across the organization, from the frontline to the executive suite, more and more teams across an organization rely on data to drive decisions. These super users will help you drive widespread adoption across the company. Your data strategy needs to keep up with these data consumers — how they're using data and their pain points.
This group holds the data that is being analyzed. Make sure that data owners and their systems aren’t passing bad data. To ensure clean data on the frontend, help them develop processes and collection methods that are built on best practices.
Hold periodic meetings to bring data owners to the table to talk about what’s broken or consistently failing. Peer pressure generated through these interactions helps compel owners to be more attentive while giving them a better understanding of how their actions impact the entire data lifecycle. Your data management structure should include dedicated resources to manage collection, quality, reporting and governance as separate but interlinked functions.
Executive leadership must drive any data transformation plan – from prioritization to budget. However, the needs and frustrations of leaders and managers across the business will also drive your to-do list. Assembling a full data-governance body brings everyone’s perspective to the process. You want an eclectic group of people from around the organization that represents the technical aspects of your organization alongside business advocates with a bias toward detail and an understanding of business needs.
Strive to create a feedback loop where you disseminate relevant technical information and your users bring you valuable insights from the business. Inside of this group, you can determine change processes, basic controls and identify programs that best support the needs of the business. Include superusers in this group who are able to drive adoption, encourage trust and create a ripple effect around following best practices.
The evolving data landscape
That leaves us with an evolving data landscape inhabited with new types of users. As data professionals, our mission is to provide reliable data and to inspire trust in that data. Even if we know our data is secure, defined and high-quality, we need to be able to prove it.
I like the idea of an ISO-like seal that visually indicates that data has been reviewed and is trusted. In combination with strict versioning, this seal can help users instantly identify data that is pristine as well as data that may not be current or is unsafe to use. Of course, to get to a trust-seal stage, you need to have solid data dictionaries and metadata catalogs in place so that everyone is speaking the same language.
Typically referenced from a compliance and security perspective, audit trails and permissions are another way to inspire trust. Knowing that the data is only being manipulated by authorized personnel boosts confidence in its accuracy. Though some may see it as restricting access, with the right checks and balances in place, I’m better able to democratize information without jeopardizing it.
Wrapped around all of these activities is education. Data literacy ties it all together so that users have a rudimentary understanding of why data is so vital and how to properly use it. Understanding that certain spelling and formatting rules for entering customer data are crucial for feeding that data into other systems empowers users to become part of the solution.
Destination: trusted data
So, how do I know that my data is trusted? I see that people are using it. It’s being leveraged for decision making and stimulating questions and discussions from the business. I’m able to measure dashboard utilization and the outcome of better decisions based on valid data. The upfront costs for the journey – resources, timelines, budget – will be higher for established organizations than born-in-the-cloud companies, but the principles are the same. And just like good data, the effort you put into aligning people, processes and goals on the frontend will yield a strong data foundation that’s able to support your future success regardless of industry, business size or revenue.
Where should you start?
Much like any transformation project, data hygiene isn’t a one-and-done project. It is an ongoing project, and executive support and a long-term vision are vital elements. It requires a change management framework that’s able to accommodate an ever-changing data landscape.
When needs or policies change, there is a waterfall effect on the logic that’s baked into the code. You need a team that can think through the entire data chain — where it’s sourced and what it feeds — to maintain data quality. And leadership has to clearly communicate the company’s intended path and the role of data in moving down that path. And then leadership has to make data chain management a priority.
So, where do you begin? Start small. Remove the sense of overwhelm that can accompany a data transformation project by starting with something small and meaningful. Focus on run-the-business data first (sales, churn, fulfillment), then move on to data that enables efficiency (customer experience, business process improvement, supply chain optimization).
I look at data as a product, not a service. As such, we approach design, production and distribution in a product delivery framework with elements of quality control, monitoring and response baked in. Score that first data win, get people to understand it, use it, trust it — and then expand to other areas or bigger platforms. As you roll out new programs, you'll have that trust established.