Keeping Up With Data #67
5 minutes for 5 hours’ worth of reading
The image above comes from the first article in the list below and it’s a collection of screen shots from the documentation of SHAP library. There is a lot of pressure on knowing why machine learning models are making their decisions. What are the key factors? What drives a propensity score of a given customer? These questions are not only important to validate and understand the model. They can also provide valuable insight into individual customers and global trends alike.
Asking questions is an important skill. Not only when it comes to data and analytics.
- 8 Booming Data Science Libraries You Must Watch Out For in 2022: Though I don’t spend my days coding anymore, I’m still reviewing a lot of code and from time to time even write a line or two. But that doesn’t make me less curious about the new libraries that are making life of data scientists easier. I’ve seen the first library on the list — SHAP — being used in multiple projects recently. It provides a great insight into key drivers of a model to data scientists and (with a little voice-over) to business people alike. I haven’t yet tried UMAP, but if it’s better than t-SNE, it’s just a matter of time. The rest of the list is certainly worth checking too. Maybe you’ll find inspiration for your next project. (Bex T. @ TDS)
- Good Data Citizenship Doesn’t Work: The omnipresence of data and growing circle of data users are quickly forming a data society. And just like in society, also data democracy needs good data citizens and strong data leaders. Making data available to the right people at the right time — democratised data — calls for a need to make the data trustworthy. But how to do that? How to document the ever-changing data? Who should do it? How to spread the word? Luckily, there are inspirations from other places focusing on making information available. News sites, Wikipedia, Quora, or Google to name a few. What are the lessons learn drawn by Benn and Mark? Review more, document less. Let there be mess (asking questions helps uncover important issues faster than any documentation). Give voice to others, not just the data owner. (Benn Stancil @ TDS)
- The Endless Data Buffet: An analogy between data mesh and brunch. Starting with data as ingredients, data product owners as chefs and data engineers as line cooks, to data products as buffet options, or self-service infrastructure being the kitchen, all the way to high-value business analysis as a plate full of delicious brunch food. And just like a restaurant is often a well operating business with rules and processes, there are some in the data mesh buffet. Chefs and kitchen workers are using the tools that are right for the job, they focus on their specialities. The food is available where the customers expect it, in the quantity and quality they paid for. As with any analogy, there are limits to the one developed here. In real life, the roles between customers and staff are not always clearly defined. So everyone is in a slightly schizophrenic situation. And a critical mass of ‘good data citizens’ and data leaders is needed to make it work. (The Sequel)
Let’s all be good data leaders and compel others to be good data citizens.