5 minutes for 5 hours’ worth of reading

Source: https://towardsdatascience.com/9-distance-measures-in-data-science-918109d069fa

Digital transformation is (or is about to be) under way in many organisations. The main thrust is often targeting advanced analytics and AI use cases, technology, data governance and the likes. One element that is getting more attention lately is the ability to make data-driven decisions. However importantly, only 17% of companies are significantly encouraging employees to become more comfortable with data.

What made me select the following three articles for this week? First, is a great story; second, comes with a cool infographic; and the third one is about one of my favourite topics.

5 minutes for 5 hours’ worth of reading

Source: https://ai.googleblog.com/2021/01/google-research-looking-back-at-2020.html

The image above comes from an article about research at Google in 2020, specifically about AutoML. It shows an evolutionary approach to meta-learning (using learning algorithms to develop new ML algorithms). The Auto-ML Zero can learn algorithms developed in the last thirty year. Is this the approach that will win the race towards the artificial general intelligence?

Ok, let’s not jump the gun! Data science tools in the browser, trends in ML and data-driven thought leadership are on the menu this week.

5 minutes for 5 hours’ worth of reading

Source: https://ai.facebook.com/blog/reverse-engineering-generative-model-from-a-single-deepfake-image/

As deepfakes are becoming more reliable, detecting whether an image is real or not can be a challenge. Researchers at Facebook and Michigan State University are now going a step further by working on methods that are trying to understand what generative models have created a deepfake. Model parsing method, outlined in the caption image, is estimating the parameters of a model used to generate a deepfake. It is somewhat conciliating to see such a detective method (even involving fingerprints) being developed.

5 minutes for 5 hours’ worth of reading

Source: https://www.microprediction.com/blog/prophet

Lately, I’ve been involved with time series forecasting. Whilst this is something that started for me at university with Holt-Winters and ARIMA models, nowadays, there are many more approaches to predicting future values of a time series. Many of them include machine learning or deep learning techniques. And it’s really easy — anyone can do that, using one of many available Python libraries. One of them — Prophet — is particularly user-friendly as it doesn’t require tweaking a lot of parameters. But just because it’s easy, it doesn’t mean it’s safe to use and a lot can go wrong.


5 minutes for 5 hours’ worth of reading

Source: https://codeascraft.com/2021/06/02/increasing-experimentation-accuracy-and-speed-by-using-control-variates/

The image above comes from an article about increasing experimentation accuracy using CUPED — a method for estimating treatment effects in A/B tests that is supposed to be more accurate than simple difference in means. A/B tests are used in situations when you are looking for an answer to a very specific question (like does a red button have higher clickthrough rate than a blue one). But not all questions are that specific.

“Asking the right questions is as important as answering them.” — Benoit Mandelbrot

So, I guess you don’t have to A/B test for what is the main…

5 minutes for 5 hours’ worth of reading

Source: The Master Algorithm by Pedro Domingos

The image above comes from a book by Pedro Domingos. The author argues the ‘Master Algorithm Hypothesis’: All knowledge — past, present, and future — can be derived from data by a single, universal learning algorithm. And calls for a unification effort between all five tribes of ML to come together and create the master algorithm. As a practitioner, I have an advantage of staying out of philosophical differences of the five tribes and rather use the fruits of all their research. But the idea of a master algorithm is certainly very appealing.

Importance of the problem formulation, introduction to…

5 minutes for 5 hours’ worth of reading

Source: https://towardsdatascience.com/data-strategy-good-data-vs-bad-data-d40f85d7ba4e

The image above is taken from the article Data Strategy: Good Data vs. Bad Data. According to the article, Good Data is data integrated into a good data strategy aligned with the company strategy. The goal is to take better actions, which are made based on good decisions. For these you need good insights derived from good information. Having good data enables it to be transformed into information. Collecting good data is a result of an action. The process is not linear, it’s a never-ending continuous improvement loop.

Two topics this week — CDO and data observability and democratisation.

5 minutes for 5 hours’ worth of reading

Source: https://towardsdatascience.com/geometric-foundations-of-deep-learning-94cdd45b451d

Today’s reading list will be very short, I’m afraid, as I’m down in bed with Covid and have a horrible headache preventing me from reading anything.

  • 150+ Concepts Heard in Data Engineering: Data engineering is a fast-evolving field…

5 minutes for 5 hours’ worth of reading

Source: https://www.kearney.com/analytics/article/?/a/the-impact-of-analytics-in-2020

Data science is about solving business problems with data and analytics. This has been my mantra for many years. It’s probably a confirmation bias that the articles below are reiterating that. Be it at the level of a company tightly aligning the data strategy to the business strategy. Or individual data scientists being obsessed with solving business problems.

Either way, I hope you’ll find the following articles inspiring. Or thought provoking at least.

5 minutes for 5 hours’ worth of reading

Source: https://medium.com/analytics-vidhya/inspiring-ideas-for-dashboards-design-172b31ca9620

Many years ago, I asked myself ‘how do sales people read dashboards’ — in order to design new dashboards around their subconscious behaviours. And when I didn’t find the answers, I asked experts from leading BI platforms. They were confused by my questions and responded with something along the lines of “you can build whatever you want; it depends on what you want”. Which isn’t exactly what I was looking for either. So, I changed tack and asked sales people ‘what are the questions you need answers to?’ …

Adam Votava

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | CEO & co-founder at DataDiligence.com

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store