Digital transformation is (or is about to be) under way in many organisations. The main thrust is often targeting advanced analytics and AI use cases, technology, data governance and the likes. One element that is getting more attention lately is the ability to make data-driven decisions. However importantly, only 17% of companies are significantly encouraging employees to become more comfortable with data.
What made me select the following three articles for this week? First, is a great story; second, comes with a cool infographic; and the third one is about one of my favourite topics.
The image above comes from an article about research at Google in 2020, specifically about AutoML. It shows an evolutionary approach to meta-learning (using learning algorithms to develop new ML algorithms). The Auto-ML Zero can learn algorithms developed in the last thirty year. Is this the approach that will win the race towards the artificial general intelligence?
Ok, let’s not jump the gun! Data science tools in the browser, trends in ML and data-driven thought leadership are on the menu this week.
As deepfakes are becoming more reliable, detecting whether an image is real or not can be a challenge. Researchers at Facebook and Michigan State University are now going a step further by working on methods that are trying to understand what generative models have created a deepfake. Model parsing method, outlined in the caption image, is estimating the parameters of a model used to generate a deepfake. It is somewhat conciliating to see such a detective method (even involving fingerprints) being developed.
Lately, I’ve been involved with time series forecasting. Whilst this is something that started for me at university with Holt-Winters and ARIMA models, nowadays, there are many more approaches to predicting future values of a time series. Many of them include machine learning or deep learning techniques. And it’s really easy — anyone can do that, using one of many available Python libraries. One of them — Prophet — is particularly user-friendly as it doesn’t require tweaking a lot of parameters. But just because it’s easy, it doesn’t mean it’s safe to use and a lot can go wrong.
The image above comes from an article about increasing experimentation accuracy using CUPED — a method for estimating treatment effects in A/B tests that is supposed to be more accurate than simple difference in means. A/B tests are used in situations when you are looking for an answer to a very specific question (like does a red button have higher clickthrough rate than a blue one). But not all questions are that specific.
“Asking the right questions is as important as answering them.” — Benoit Mandelbrot
So, I guess you don’t have to A/B test for what is the main…
The image above comes from a book by Pedro Domingos. The author argues the ‘Master Algorithm Hypothesis’: All knowledge — past, present, and future — can be derived from data by a single, universal learning algorithm. And calls for a unification effort between all five tribes of ML to come together and create the master algorithm. As a practitioner, I have an advantage of staying out of philosophical differences of the five tribes and rather use the fruits of all their research. But the idea of a master algorithm is certainly very appealing.
Importance of the problem formulation, introduction to…
The image above is taken from the article Data Strategy: Good Data vs. Bad Data. According to the article, Good Data is data integrated into a good data strategy aligned with the company strategy. The goal is to take better actions, which are made based on good decisions. For these you need good insights derived from good information. Having good data enables it to be transformed into information. Collecting good data is a result of an action. The process is not linear, it’s a never-ending continuous improvement loop.
Two topics this week — CDO and data observability and democratisation.
Today’s reading list will be very short, I’m afraid, as I’m down in bed with Covid and have a horrible headache preventing me from reading anything.
Data science is about solving business problems with data and analytics. This has been my mantra for many years. It’s probably a confirmation bias that the articles below are reiterating that. Be it at the level of a company tightly aligning the data strategy to the business strategy. Or individual data scientists being obsessed with solving business problems.
Either way, I hope you’ll find the following articles inspiring. Or thought provoking at least.
Many years ago, I asked myself ‘how do sales people read dashboards’ — in order to design new dashboards around their subconscious behaviours. And when I didn’t find the answers, I asked experts from leading BI platforms. They were confused by my questions and responded with something along the lines of “you can build whatever you want; it depends on what you want”. Which isn’t exactly what I was looking for either. So, I changed tack and asked sales people ‘what are the questions you need answers to?’ …