Keeping Up With Data #64
5 minutes for 5 hours’ worth of reading
Happy New Year! Almost everyone around me took Christmas holidays really easy — with families, without excessive travel, with a very little work and a lot of reading.
But now — with only the first week ending, it already feels the holidays were months ago. Doesn’t it?
Well, here are the three data articles to start the 2022 strongly.
- Real-time machine learning: challenges and solutions: There are two components of the problem — online prediction and continual learning. Online prediction obviously involves online inference but ultimately it is also about including streaming features captured live in the user session. The benefit of moving from batch to online prediction lies not only in increased accuracy of the predictions but also about opening option for online model evaluation. The second component — continual learning — is commonly understood as retraining the models more frequently. But as Chip writes, it’s not about the retraining frequency, but the manner in which the model is retrained. Continual learning is in fact about fine-tuning the model on new data (stateful training — see the image above). Though real-time ML is currently pioneered mainly by tech companies, it is a topic worth monitoring. In three years time it might well be the mainstream. (Chip Huyen)
- Design in the era of the algorithm: A still very relevant article from 2017. Machine learning algorithms are powering a lot of decisions around us. And as Josh puts it: “the design and presentation of data is as important as the underlying algorithm.” In this longer article (perfect for a morning commute, or late WFH evening) we embark on a journey of the the “see-saw relationship between technical advance and disillusionment” when AI and ML are creating a lot of excitement with occasional dumb moments. And we also learn ten design principles to temper the overconfidence of machines when they often present one option suggesting that it’s the only true answer (when in fact they are often just guessing). Like I use to say, data & analytics is rarely only about data and analytics. (big medium)
- Why most analytics efforts fail: Many companies are building data systems to be able to make data informed decisions quickly. Here we are specifically talking to event analytics involving massive amounts of events data generated by users’ actions in the online world. But the large volume makes it difficult to focus on the right things — resulting to a high failure rate of event analytics efforts. The fundamental questions for any such initiative are “how you think about what to track, how to track it, and manage it over time?” Crystal is using her experience to understand the symptoms of failed event analytics projects and see their actual root causes, and ultimately suggests a process to find the answer to the fundamental questions above. (Reforge)
We had a surprisingly warm weather in Zurich in the last couple of days allowing me to combine cycling with skiing, which was amazing. Now it seems the winter is back so let’s make most of it during the weekend.