Ishan at seefromthesky (https://unsplash.com/@seefromthesky)

The Third Wave: Democratization of Data Science/Algorithms

Kan Nishida
learn data science

--

Hardly a day goes by without hearing somebody talking about AI these days.

But AI is actually not new.

In fact, AI has been around for a long time ever since Alan Turing came up with the concept in 1940’s during the second world war when his team was working to break the cipher of Nazi’s Enigma machine. (You can watch much more entertaining details in the movie called “The Imitation Game”.)

Since then though, AI, or more precisely, Machine Learning & Statistics have evolved a lot. Of course, the things AI can do has expanded a lot, and the quality of AI has progressed.

But also, the way these algorithms are used has changed dramatically.

The current transformation we are witnessing is called the third wave, the one that is finally trying to get such algorithms into everyone’s hands.

To understand what is the third wave and why that is important, let’s take a look at a bit of history of Machine Learning & Statistics Algorithms.

First Wave: Monetization of Algorithms

And in fact, when I started my career in the world of data analysis 20 years ago, there were companies like SAS, SPSS, IBM who were producing the algorithms behind AI.

We didn’t call it AI specifically back then but they were more or less the same mathematical and statistical algorithms that we still use for some of the so-called ‘AI applications’ today.

They were mainly used by professionally trained statisticians who could afford for paying hundreds of thousands or millions of dollars just to use them. But nonetheless, this was a big deal because any company could access such algorithms, instead of having to build them by themselves, though of course as long as they could afford. We call this first wave of the transformation in AI and Machine Learning, ‘Monetization of Algorithms’.

But when we talk about AI, Machine Learning, or Data Science today we no longer hear those old mega enterprise company names. Instead what we hear is a bunch of new (or relatively new) libraries and algorithms that are built by individuals or Silicon Valley hot tech companies like Google, Facebook, AirBnB, etc. who open source them to make them better with the communities. When was the last time you heard any startups or new services that launched successful AI products using the algorithms from the old enterprise companies?

Second Wave: Commoditization of Algorithms

So, many of the AI or Machine Learning algorithms have become much more accessible for anyone because they are available as open source and free. And thanks to being open source the innovation and the development of the algorithms have become exponentially faster and dynamic. Today, people use this new generation of the algorithms not just because they are free, but also simply because they are better and more suitable to deal with today’s data challenges of the massive volume and diverse types that have been caused by the transformational changes like Internet, Cloud, Mobile, IoT, etc.

State of the Art Open Source Algorithms

Now when I talk about this new generation of the open source AI and Machine Learning algorithms, what are they exactly? Let’s take a look at a few examples quickly.

September 2014, Google open sourced ‘Causal Impact’ algorithm, which is an implementation of a Bayesian approach to causal impact estimation in time series. Their data scientists use it for understanding how much of the daily website clicks are actually generated by an advertising campaign.

January 2015, Twitter open sourced ‘Anomaly Detection’ algorithm, which is to detect anomalies or ‘trending’ information in data taking into both local and global seasonal trends. Their data scientists use it to understand what is causing unusual traffic to their site and pages.

June 2015, Airbnb open sourced a machine learning framework called ‘Aerosolve’. And their data scientists use it for the hotel price optimization, demand simulation, etc.

November 2015, Google open sourced Tensor Flow, which is a ‘Deep Learning’ library that can be used to build and train models with ‘neural network’ based algorithm. They use it for their services like Search Signal, Email auto-responder, Photo Search, Voice, Translate, etc.

February 2017, Facebook open sourced ‘Prophet’ algorithm, which is to forecast the time series data. This uses Stan’s probabilistic language internally and builds additive models to forecast, which makes it easier to produce higher quality results even for those without experience in the field of the forecasting. Their data scientists use it for forecasting the demand for their web services in order to efficiently allocate their resources like machines, people, etc.

This is just a tip of the iceberg, and there are a lot more. And not just the hot Silicon Valley tech companies like above, but also many passionate individuals and organizations including schools around the world are contributing amazing algorithms daily. Here’s a chart that shows how many packages have been added to R’s central repository over the last 20 years just as an example.

Created by Gergely Daróczi

So the challenge for the data scientists is no longer about figuring out how to work better with the existing algorithms that are available to them. It is about how to keep up with the latest of the cutting edge algorithms and choose the right ones to address their particular problems.

And we call this second wave of disruptive transformation in the world of AI and Machine Learning ‘Commoditization of Algorithms’. The cost of the high-quality algorithms has become zero or almost zero and all of a sudden you have immediate access to this abundance of high quality and the most advanced algorithms in the world.

AI for Programmer, BI for Non-Programmer

But there is only one problem. You need to be able to program with one of the data science languages like R, Python, etc. But not everybody can become proficient with the programming all of a sudden. This is why many companies hire data scientists who can program with those languages and access those open source algorithms and libraries and help them gain deeper insights from their data quickly and effectively.

But, not everybody can afford hiring these expensive data scientists. Also, it’s not easy to find them and hire them unless you understand data science, which is still mysterious to many.

So, many people who have data and want to gain better insights from them are often left with some traditional BI tools like Excel, Tableau, etc. to do simple counting without being able to take advantage of the commoditized algorithms that could have been available to them only if they could program.

But, wouldn’t that be great if anyone can access the cutting edge open source algorithms that are available to only the programmers and data scientists today? Wouldn’t that be even better if we can use such algorithms as part of our daily data analysis routines without programming?

Third Wave: Democratization of Algorithms

This is why we start seeing a new generation of UI based data science tools like Exploratory, Dataiku, etc. who embrace the open source technology and algorithms and make them much more accessible for those who couldn’t ride the wave of ‘Commoditization of Algorithm’.

For example, the users of Exploratory —known as UI for R — are generally not data scientists, but business analysts or consultants with deep domain knowledge of their businesses like Marketing, Financial, Manufacturing, Education, Logistics, etc. And they use such open source AI and Machine Learning algorithms to gain much deeper insights from their business data than they could have done with the traditional BI tools.

In the world of data analysis, we have about 4 million R and Python users. That’s only about 0.6% of the all Excel users (about 600 million, according to Microsoft) in the world. The next generation UI based tools are expected to serve the remaining 99% of the people who have the need for understanding the data by making the open source and high-quality algorithms more accessible. And we call this third wave of the disruptive transformation ‘Democratization of Algorithms’.

To put them in a perspective, we can draw 3 of the major transformational changes in the world of AI and Machine Learning like below.

The emergence of Internet, Cloud, Mobile, and IoT (or connected devices) have changed the world of data so significantly over the last twenty years or so. It has made it much easier and faster to collect data. And the amount of the data we collect is exponentially growing. The speed of the data accumulation is only exponentially increasing. All of us need to be equipped with the modern tools and algorithms and ready for turning this daunting challenge to a great opportunity, just like those data scientists have been doing already.

We are still at the very early stage of ‘Democratization of Algorithms’, and I’m very excited to help more people ride on the third wave and see how the world of data analysis will look like with many more innovations to come over the next 5 to 10 years.

If you want to learn data science without programming, sign up for a free trial of Exploratory. If you are currently a student or teacher it’s free!

--

--

CEO / Founder at Exploratory(https://exploratory.io/). Having fun analyzing interesting data and learning something new everyday.