Support: 1-800-961-4454
Sales Chat
1-800-961-2888

Do You Need A Data Scientist?

4

The question “do you need a data scientist?” came up a lot when I was a management consultant for a global firm that successfully incubated data science within a few enterprise organizations. It’s hard. The discussion is hard and the culture clash for data scientists is hard. Many approach data science as some dark magic from Hogwarts. It’s not. Investigating a hypothesis takes time. Spontaneously generating data and building a model against that data doesn’t work. Understanding who you need and how they will fit into your organization is challenging. Where do we put them? Who do they interact with? What is the hand-off? Who do we structure around the project? How do you execute a project? Even better, how do we make MONEY? Yet, before we go there, perhaps we should step back a bit and think of this as a strategic question. Because maybe you do need a data scientist and maybe you don’t.

If you are thinking about whether or not you need a data scientist, then here are some questions and insights to consider.

How accessible is your data?

  • Algorithms are not the problem. Understanding what data goes into those algorithms is the crux of the issue. This requires accessible data.
  • There are many access patterns in data science. These patterns include discovery, development, deployment, and maintenance. Getting to an infrastructure and data lifecycle that supports these patterns takes time.
  • Data Scientists ask a lot of questions about data. Asking questions on raw data is hard and time intensive. It is expensive to pay a data scientist to ask questions on raw data when you are doing an insights driven project. It is probably best to enhance your calm and bring them on board when your data is ready for witchcraft and wizardry.
  • Focus on getting accessible quality data and solid reporting. Then worry about data science. You’ll save money and efficiency.

How vs. Why?

  • If you start with data science and ask how you do it rather than why you need it, you end up solving a problem for the wrong use-case. For example, you may end up focusing on scale and then find out what you needed was effective sampling techniques.
  • If you solve for why, how becomes easy.

Product or Project?

  • Are you making a product or doing a 6 month project?
  • Is the project being reused?
  • A product that has a point of failure on a data pipeline is different from a project that needs the output of a data pipeline.
  • A data scientist can certainly do a project and get insights, building an infrastructure that empowers a group of data scientists to drive insights takes a product mindset. Data reusability and accessibility are key.
  • Data scientists are product people. You can sell a product for a long time. It is hard to justify ROI on a data scientist for a short-term project that isn’t likely to be reused.

I firmly believe that everyone in the enterprise needs or will need data science at some point. Yet, finding a relevant product that requires data science is the hard part. Statistics and predictive modeling are not new. Throw in ad-hoc innovative culture, scale, and reusable data pipelines all feeding some user application and you might have data science. Maybe the question isn’t “do you need a data scientist?” but rather, “are you doing something right now that warrants data science?”

About the Author

This is a post written and contributed by Nick Kolegraff.

Nick is the Director of Data Science at Rackspace. He suffers from TADHD (Technology Attention Deficit Hyperactivity Disorder)...which led him into data science. In previous dimensions, he started and incubated data science at Fortune 500 companies while working for one of the world's largest global management consulting firms. He got his start designing hardware devices for voice controlled medical beds and then became more interested in intelligent non-living things. Later, he designed and implemented scalable backend systems for predictive modeling products as well as a few production recommender systems for large retailers. His TADHD started in college where he was making HPC clusters in his basement to do math and building voice controlled potato cannons. Nick holds a BS in Statistics and a BA in Computer Science from the University of Iowa. In his free time, Nick enjoys mountain biking, rock climbing, hiking, and working on some of his open source projects.


More
4 Comments

Hey Nick — love this post — great overview of how to do baby-step thinking about data science. The term gets thrown around so much, it can be hard to figure out specific implementation steps.

avatar Mary W on July 15, 2013 | Reply

Thanks Mary — Its true, the term is thrown around way too much in multiple different contexts making it very difficult to understand just what the heck is going on here. This was yet-another-attempt to try and solidify things around this space. I tend to converge on the simplicity of building intelligent products at scale.

avatar Nick K on July 15, 2013

Thanks Nick This is interesting

avatar IhwanIMS on July 15, 2013 | Reply

Loved it, nice article :o)

avatar Carla Gentry (@data_nerd) on July 29, 2013 | Reply

Leave a New Comment

(Required)


Racker Powered
©2014 Rackspace, US Inc.