Keeping Up With Data #60
5 minutes for 5 hours’ worth of reading
Andrej Karpathy wrote a Twitter thread about ongoing consolidation in AI. How “~decade ago vision, speech, natural language, reinforcement learning, etc. were completely separate” and now transformers are dominant in all of them. And how just like neocortex has a largely similar architecture across all the input modalities, transformers can be fed with sequences of words, image patches, or speech pieces. And how it’s great news for AI research. I think it’s also consolidating the talent pool, which is imho a great news for companies and ML professionals alike.
The following three articles made it to my reading list this week:
- How Spotify Uses ML to Create the Future of Personalization: Personalisation is a big thing in Spotify (they even have a VP of personalisation!). I’ve been impressed with their recommendation engine for a while now. How they take into account the listening history, the music itself, time of a day and much more to recommend the most enjoyable songs out of 70 million tracks to every single of their ~380 million users. They’ve been experimenting with explore and exploit concepts to make the experience sustainable. Now it seems that the reinforcement learning is the way to ensure long-term satisfaction and enjoyment of the listeners. (Engineering at Spotify)
- dbt and the Analytics Engineer — what’s the hype about? The article felt timely with coalesce conference taking place this week. First, who is an analytics engineer? Tuan Nguyen writes: “If a Data Engineer marries a Data Analyst and they have a baby girl, that baby girl will be an Analytics Engineer. Well, it does not work that way, but you get the point.” The rise of the role is fuelled by the shift from ETL to ELT, which “presents an opportunity for very technical analysts who understand the business well to model the raw data into clean, well-defined datasets”. And by the fact that the modern cloud warehouses have enough power to process these transformations. I’d only add that it’s also opening possibilities for businesses to create their digital twin and keep updating it as the business and its priorities evolve. (Oliver Molander @ Validio)
- From Prediction to Action — How to Learn Optimal Policies From Data (1/4): When fighting customer churn companies often go and build a churn model (I’ve done that many times myself). But knowing the propensity or probability of churn is not enough. We need to take preventative action. But individual customers might react to each possible action differently. How do we select the optimal action (a problem of finding an optimal policy)? One that will lead to maximal expected profit. The series (yes, the “1/4” means there are four articles there) explain how to do that. Policy optimisation is indeed a very useful data science skill. Not only for data scientists. (Rama Ramakrishnan @ TDS)
We went to Lucerne last week and though there were no massive Christmas markets it still felt very Christmassy. So, we bought a mini table football as an early present. Is it a beginning of the period when I’m buying myself toys and can say it’s for the kids?