Keeping Up With Data #65
5 minutes for 5 hours’ worth of reading
January feels like a strange month to me. Everyone has plenty of energy, ideas, ambitions, and wants to get moving. But first few weeks of new year are often filled with alignments and discussions about how to get these ideas started. Every year I’m wondering why these alignments and discussions didn’t happen in December. But so be it.
A mixed bag this week — statistics, data engineering, and machine learning.
- A/B testing — Is there a better way? An exploration of multi-armed bandits: A/B testing offers a great way to make data-driven decisions when comparing two (or more) options. But running an A/B test properly requires a significant number of observations for each option. But if one option is way weaker, we might be losing money by using it too many times. That’s when the multi-armed bandits come in. These are designed to balance using the best option known so far (exploitation) and testing other — potentially better — options (exploration). There are multiple algorithms to be considered. How do they work? How do they compare? And ultimately, which one to use? For answers to these questions read this great article. (Greg Rafferty @ TDS)
- The future history of Data Engineering: The role of data engineers is evolving as data toolkit is getting better and data professionals can focus more on getting the value of data out for their businesses. What does it mean for current and future data engineers? (And there will be many of them — as “the current under-supply of competent engineers will lead to an over-supply of junior engineers”.) Will they be replaced by analytics engineers? Or data analysts? There is lot of great ideas and thoughts in the article. I think for some time to come, the whole data field will keep evolving quickly, but not all companies will be moving at the same pace, which will create great variability in actual work of individual job titles. So titles will be less explanatory of what the job is actually about. Therefore, both candidates and companies need to pay extra attention to aligning the expectations from the roles. And even then, everyone needs to be ready for the daily work to be shifting in time. (Group by 1)
- Ensemble methods: bagging, boosting and stacking: With the gradient boosting trees and namely XGBoost or LightGBM implementations being widely used for various classification and forecasting tasks lately, it is worth to recap the ensemble methods. The idea behind ensemble learning is to train multiple models and combine them together to achieve a better performance. How they are combined differentiates the bagging, boosting and stacking. Whether as a recap or to learn something new, the visualisations in the article will make that real easy for anyone. (Joseph Rocca & Baptiste Rocca @ TDS)
And if you (or anyone you know) is looking for a strategic data consulting job — DataDiligence is hiring new data associates!