Keeping Up With Data #65


January feels like a strange month to me. Everyone has plenty of energy, ideas, ambitions, and wants to get moving. But first few weeks of new year are often filled with alignments and discussions about how to get these ideas started. Every year I’m wondering why these alignments and discussions didn’t happen in December. But so be it.

A mixed bag this week — statistics, data engineering, and machine learning.

  • A/B testing — Is there a better way? An exploration of multi-armed bandits: A/B testing offers a great way to make data-driven decisions when comparing two (or more) options. But running an A/B test properly requires a significant number of observations for each option. But if one option is way weaker, we might be losing money by using it too many times. That’s when the multi-armed bandits come in. These are designed to balance using the best option known so far (exploitation) and testing other — potentially better — options (exploration). There are multiple algorithms to be considered. How do they work? How do they compare? And ultimately, which one to use? For answers to these questions read this great article. (Greg Rafferty @ TDS)
  • The future history of Data Engineering: The role of data engineers is evolving as data toolkit is getting better and data professionals can focus more on getting the value of data out for their businesses. What does it mean for current and future data engineers? (And there will be many of them — as “the current under-supply of competent engineers will lead to an over-supply of junior engineers”.) Will they be replaced by analytics engineers? Or data analysts? There is lot of great ideas and thoughts in the article. I think for some time to come, the whole data field will keep evolving quickly, but not all companies will be moving at the same pace, which will create great variability in actual work of individual job titles. So titles will be less explanatory of what the job is actually about. Therefore, both candidates and companies need to pay extra attention to aligning the expectations from the roles. And even then, everyone needs to be ready for the daily work to be shifting in time. (Group by 1)
  • Ensemble methods: bagging, boosting and stacking: With the gradient boosting trees and namely XGBoost or LightGBM implementations being widely used for various classification and forecasting tasks lately, it is worth to recap the ensemble methods. The idea behind ensemble learning is to train multiple models and combine them together to achieve a better performance. How they are combined differentiates the bagging, boosting and stacking. Whether as a recap or to learn something new, the visualisations in the article will make that real easy for anyone. (Joseph Rocca & Baptiste Rocca @ TDS)

Happy weekend!

And if you (or anyone you know) is looking for a strategic data consulting job — DataDiligence is hiring new data associates!

In case you missed the last week’s issue of Keeping up with data

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.




Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Data Science in 2021

«Weekly Report» The Change of AIDUS QTS Profit Rate (June 4, 2021)

The One, Two, Threes of Data Labeling for Computer Vision

Different ways to iterate a dataframe in pandas

How To Set Up An ML Data Labeling System

3 Lessons of Data Science Harmony

Are you using data visualization in your communications?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Votava

Adam Votava

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at

More from Medium

Keeping Up With Data #67

Data and the Meaning of Life According to Monty Python | DataDrivenInvestor

Metrics Store in Action|Data Analytics Workflow: Before vs After

Data Science won’t save you if the strategy isn’t there