Keeping Up With Data #79


The image above comes from Bill Schmarzo’s article called The Ugly Truth About Data Management. It is a sad truth that many data infrastructures resemble a Rube Goldberg machine. One taking a relatively straightforward business reality, captured in an incredibly complex (and complicated) way. Not only it prevents an organisation from effectively solving its business problems, but it also keeps many of the smartest people fighting the battle to keep the system alive. In order to minimise the risk of making wrong and irreversible (which is the worst!) choices, we should keep learning from industry best practices and lessons learnt shared by others.

And that’s the whole point of this weekly reading list.

  • Recommender Systems, Not Just Recommender Models: The purpose of the recommender models is to score an interest of a user in an item. This is obviously very useful in many situations when companies want to serve personalised subsets of content, items, products etc. to their customers. But in real life, the model is not enough as there are many other challenges. There might be too many items making the computations difficult, we don’t want to recommend some of the items, or we want to promote some other items. Therefore, a complete recommender system goes way beyond the recommender model and consists of four main stages — Retrieval, Filtering, Scoring, and Ordering. (Even Oldridge @ NVIDIA Merlin)
  • How to Measure and Mitigate Position Bias: “Position bias happens when higher positioned items are more likely to be seen and thus clicked regardless of their actual relevance. This leads to lesser engagement on lower ranked items.” This presents a challenge to the ML engineers because “training our models on biased historical data perpetuates the bias via a self-reinforcing feedback loop.” Luckily, there are ways to measure and mitigate position bias. Adding randomness is one of them. (Eugene Yan)
  • Advanced exploratory data analysis (EDA) with Python: Every data project needs EDA. Sometimes more thorough, sometimes a quick one. But we always need to get familiar with the data and inspect anything relevant for the problem at hand. I remember ten years ago I wrote my own EDA package to increase my productivity with this important step. Nowadays, there are many packages and guides — such as this one — helping data analysts and data scientists to spend more time exploring the data than writing code to explore the data. From time to time, I’m reviewing data science testing tasks during a recruitment process. My advice to the candidates: Please don’t rush to get to modelling too soon. In the end, it’s usually very inefficient. (Michael Notter @ EPFL Extension School)

Brent Dykes wrote a piece for Forbes, in which he’s advising companies not to let a misguided AI strategy sabotage their brand experience. He says that: “by focusing exclusively on cost savings with your AI strategy, your organization could be sabotaging its own brand reputation.” I personally cannot agree more. Use technology to solve people’s problems. Not because it’s cool.

In case you missed the last week’s issue of Keeping up with data

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.




Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Storytelling in Data Analysis

A Gentle Introduction To Dimensionality Reduction

How to turn your data into a money maker

The problem is that they are expensive, requiring the government to get behind them 9.12.2020

Build a Job Search Portal with Django — Candidates App Templates (Part 4)

How to Automatically Generate VGG Image Annotation Files

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Votava

Adam Votava

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at

More from Medium

Keeping Up With Data #78

Offboarding guide — Leave your Data Analyst job in style

What does modern Data Acquisition look like?

Data for all: Why data democratization matters at every scale