Posted by Meerah Rajavel

In 2012, Harvard Business Review called out “Data Scientist” as “The Sexiest Job of the 21st Century”.  In 2011, McKinsey & Company projected a global demand of 1.5 M data scientist which triggered numerous top institutions such as Harvard, Kellogg, Stanford, etc. to offer a focus and specialized program in Data Science. Interestingly, according to Wikipedia, the term “data science” has existed for over fifty years and was used initially as a substitute for computer science by Peter Naur in 1960. The question is, why did it take half a century for it to become sexy?

Before we dive into the reasoning, let us look at the definition of Data Science.  Data science is an interdisciplinary field about processes and systems to extract knowledge or insights from data in various forms, either structured or unstructured[1]. Data science is not a mere use of data, but it enables the creation of data applications/products which acquires its value from the data, and creates more data as a result to drive business value.

Data science has been in practice under the name of statistical modelling, in many industries like mining, insurance, finance, etc. for years. However, in my opinion there are 5 factors that accelerated this discipline and made it to the center regardless of business size and industry:

  • The speed of data creation: We create the same amount of data in a day that took us as a human race to create from dawn of civilization to 2003. Does that sound like ‘Big Data’?
  • Moore’s Law: The iPhone I hold today has more compute & storage power than the computer that put Apollo on the moon. Data Storage and processing have gotten cheaper. With Cloud, the access to these resources has been accelerated while the cost has been further optimized.
  • Advancement of Tools: The advancement of big-data technology platforms like Hadoop, Spark, NoSQL DB, and powerful visualization tools like Qlik, SAS, etc., significantly lowered the bar for data science to make it more accessible and affordable.
  • Shift from ‘The power of N to power of All’: As a customer I am not satisfied anymore if I am presented the experience of a ‘middle aged women’. I need the experience to be more personal, and aligned to my interests and personality. I would like to be identified as a middle aged woman who is motivated to solve business and social problems, enjoys hiking and running, loves reading and spending time with family & friends. The changing expectation from the customers forces the business to view the customers with a multi-dimensional lens. The traditional approach of statistical sampling with a subset of data fails miserably, as the probability of picking right sample for all scenarios are low. This forces the business to process ALL the data and derive customer patterns.
  • Speed to Action: On one end, the speed of creation of data is accelerating, but the value of data as time goes by drops significantly. As an example when the customer walks through the retail store, presenting them with the coupons based on their recent search cookies on the mobile phone increase the chance of purchase, than sending them coupons based on the data collected after a day or even a few hours.

So what is ‘art’ and what is ‘science’ in the data science? Though data science as a discipline is not limited to big data, it is one of the fundamental driver of this discipline. To answer this question, let us look at the 4Vs of the big data:

  • Volume – Size of the data
  • Variety – Types of data; structured and unstructured data
  • Velocity – How fast the data changes; streaming speed
  • Veracity – Uncertainty of Data (a.k.a) data quality and relevance of the data to the business domain and context

Volume, Variety and Velocity can be addressed in a scientific way using various tools and technologies, however the Veracity requires judgment and domain knowledge to infer the relevance of the derived insight, which is an art.

I believe following a data driven discipline is not a novelty anymore, but a required mainstream competency to meet the customer expectations and be nimble, agile and responsive to the competition. For businesses who are investing in data science, I suggest considering the following three steps:

  1. Purpose: Know what you are trying to measure; collect the data relevant to the purpose and know where the data is coming from. You cannot judge the quality of the analytics if you don’t have a very clear idea of where the data came from.
  2. Talent: Data science is more than technology; it is an inter-disciplinary field. Because analytics often boils down to making comparisons between groups and pattern identification, it is important to know how those groups are selected. Domain knowledge plays a crucial role which mostly the data scientists do not have. It is critical for the managers who hold the knowledge of the business are paired with the data scientists who can think out of the box, willing to build data products incrementally with the ability to explore and iterate over a solution.
  3. Culture: Foster the organizational culture of data driven decision making and avoid managerial bias.

In conclusion, I say ‘Data is currency’ in today’s digital world. Businesses who like to stay ahead and capture rapid market share need to get data centric sooner rather than later.

[1] Wikipedia