Keeping Up With Data #79

Source: https://www.dell.com/en-us/blog/the-ugly-truth-about-data-management/

The image above comes from Bill Schmarzo’s article called The Ugly Truth About Data Management. It is a sad truth that many data infrastructures resemble a Rube Goldberg machine. One taking a relatively straightforward business reality, captured in an incredibly complex (and complicated) way. Not only it prevents an organisation from effectively solving its business problems, but it also keeps many of the smartest people fighting the battle to keep the system alive. In order to minimise the risk of making wrong and irreversible (which is the worst!) choices, we should keep learning from industry best practices and lessons learnt shared by others.

And that’s the whole point of this weekly reading list.

  • Recommender Systems, Not Just Recommender Models: The purpose of the recommender models is to score an interest of a user in an item. This is obviously very useful in many situations when companies want to serve personalised subsets of content, items, products etc. to their customers. But in real life, the model is not enough as there are many other challenges. There might be too many items making the computations difficult, we don’t want to recommend some of the items, or we want to promote some other items. Therefore, a complete recommender system goes way beyond the recommender model and consists of four main stages — Retrieval, Filtering, Scoring, and Ordering. (Even Oldridge @ NVIDIA Merlin)
  • How to Measure and Mitigate Position Bias: “Position bias happens when higher positioned items are more likely to be seen and thus clicked regardless of their actual relevance. This leads to lesser engagement on lower ranked items.” This presents a challenge to the ML engineers because “training our models on biased historical data perpetuates the bias via a self-reinforcing feedback loop.” Luckily, there are ways to measure and mitigate position bias. Adding randomness is one of them. (Eugene Yan)
  • Advanced exploratory data analysis (EDA) with Python: Every data project needs EDA. Sometimes more thorough, sometimes a quick one. But we always need to get familiar with the data and inspect anything relevant for the problem at hand. I remember ten years ago I wrote my own EDA package to increase my productivity with this important step. Nowadays, there are many packages and guides — such as this one — helping data analysts and data scientists to spend more time exploring the data than writing code to explore the data. From time to time, I’m reviewing data science testing tasks during a recruitment process. My advice to the candidates: Please don’t rush to get to modelling too soon. In the end, it’s usually very inefficient. (Michael Notter @ EPFL Extension School)

Brent Dykes wrote a piece for Forbes, in which he’s advising companies not to let a misguided AI strategy sabotage their brand experience. He says that: “by focusing exclusively on cost savings with your AI strategy, your organization could be sabotaging its own brand reputation.” I personally cannot agree more. Use technology to solve people’s problems. Not because it’s cool.

In case you missed the last week’s issue of Keeping up with data

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.

--

--

--

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at DataDiligence.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

COVID 19 Transmission, Mortality, and Temperature

Empowering People With Data Workshop

img_5210

Boost your Data Science/Software Development/Growth Hacking skills and have impact

Can we deduce people’s ideologies from their Reddit comments?

SDG 6 changed the game: Now let us agree how we should measure it

Data Lake House Architecture

How to Buy The BestMattress https://t.co/QNxusgQmMa

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Votava

Adam Votava

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at DataDiligence.com

More from Medium

Data as a Service: To Make the Most Out of Your Data

10 predictions for Customer Data Platforms (CDPs) in 2030

ELT vs ETL: Why not both?

What types of AutoML models you can build in 2022?