Keeping Up With Data — Week 35 Reading List


Gartner’s hype cycle for data science and ML, shown above, brings plenty of terms we’ve been hearing for a while and couple of new ones, too. Gartner is often coining or popularising new terms, some of which are understandable — like ‘small and wide data’ — others need to be constantly googled (at least by me)— like ‘citizen data science’ or ‘X analytics’. Another I find slightly confusing is the co-existence of ‘MLOps’ and ‘ModelOps’ in the picture. But I guess it says a lot that the ‘innovation trigger’ stage is full of terms, while the ‘plateau of productivity’ is not.

While sometimes thought leaders seem to be complicating simple things, data is generally about simplifying complex reality — as can be seen in the following articles.

  • Simpson’s Paradox and Interpreting Data: Data as a finite representation of a very complex real world and will never be a perfect reflection. Intuition behind what’s missing in the data (but should be included) is the art of data science. Simpson’s paradox states: A trend or result that is present when data is put into groups that reverses or disappears when the data is combined. The reason for this is so called ‘lurking variables’, which split the data into multiple distributions. They are difficult to find. And the decision to look at the data together — or by groups — is entirely situational. People sometimes consider data as an absolute truth. ‘Data don’t lie’, they say. Well, what if an important assumption is not met? Be careful to draw conclusions for a complex reality based on findings from a simple reflection. (Tom Grigg @ TDS)
  • The Role of AI in HR Decision Making: Is there anything more complex than people? In such complex environments — like organisations — it’s difficult to image a fully autonomous AI making decisions. But luckily, it doesn’t mean that HR can’t leverage AI for a wide range of decision making. Instead of automation, we should think of augmentation. Data-augmented decision making combines ‘could’, ‘should’ and ‘would’ questions. The first two can be answered with data. Could we fill in a position with existing talent? Should we do it? The third type — Would the person be happy to transfer? Would it be a good fit? — not so much. But that’s the complexity of HR that we need to take into account. (myHRfuture)
  • Pseudo-R²: A Metric for Quantifying Interestingness: In case of linear outcomes, the common measure (by statisticians) of interestingness in ‘variance explained’ — often described by R². But what to do in the case of non-linear outputs (e.g., “yes” or “no”)? For instance, what splits of an overall conversion rate do we consider most interesting? By device? By campaign? By country? By gender? And how can we quantify that? The suggestion is to use McFadden’s pseudo-R². Mostly because it balances variation with composition. Pseudo-R² is low when the groups explain no variation (in conversion rates) and also when one of the groups is significantly larger. Just as intuition tells us that the most interesting split is the one with proportional sizes of the groups with largest differences between the conversion rates. (Heap blog)

Apart from reading these (and many more) interesting articles, I’ve also published a piece about challenges of data adoption pair with tips on how to overcome them.

Until next week!

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.




Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Use annotations to tell better stories with your plots

Linear Regression — Part III — R Squared

Should You Join a Big Corporation or a Small Startup As a Data Scientist?

Building a comprehensive set of Technical Indicators in Python for quantitative trading

Cup of Machine Learning From Starbucks

What Is Judgment?!…and some tips for combining Machines and People

“Numerical Analysis” Science-Research, November 2021, Week 2 — summary from Astrophysics Data…

Make Smart Retargeting with Time Series Data

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Votava

Adam Votava

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at

More from Medium

Keeping Up With Data #68

Is building data analytics infrastructure slowing down your business?

Is data really important?

What Dan Levy Taught Me About Data Storytelling