Keeping Up With Data #75

5 minutes for 5 hours’ worth of reading

Adam Votava
2 min readMar 25, 2022
Image by Liz Fosslien | Source: https://www.linkedin.com/posts/liz-fosslien_licreatoraccelerator-activity-6911526838167097344-yP0l/

Are the future data tools code-first, low-code or no-code? As ever, there are supporters of all these options. There are advocates (often providers) of no-code or low-code solutions as well as opinions against the no-code solutions or against limitations of the current no-code solutions. And there are also voices rooting for co-existence of both. I like that. Having options is good.

The picture by Liz Fosslien above made me smile. Having designed few data strategies, I can certainly relate.

Data contracts, data lineage, and feature importance. Rather technical reads today.

  • Data Contracts — ensure robustness in your data mesh architecture: Data contracts provide clarity to the problem of data ownership in the federated architectures. As “they provide information on what data products are being consumed, by whom and for what purpose [… they] are essential for robust data management!” How exactly do they work? And what is the role of data sharing agreements? Read Piethein’s article with the answers and practical examples. (Piethein Strengholt @ TDS)
  • Stop using random forest feature importances. Take this intuitive approach instead: Scikit-learn’s implementation of random forest and other tree-based algorithms provides the view into the importance of individual features. However, “tree based models have a strong tendency to overestimate the importance of continuous numerical or high cardinality categorical features.” And that can easily lead to wrong conclusions or decisions. The proposed alternative is to use Permutation Feature Importance. Or SHAP. (Ali Soleymani @ Medium)
  • How Should We Be Thinking about Data Lineage? While business reality of most of the organisations is fairly straightforward, its reflection in their data not so much. Which is why data lineage — both technical and business — is becoming a big topic (not only) in data governance. Data lineage can help with data trouble shooting, impact analysis, discovery and trust, but also data valuation. For the three steps to implement data lineage (and related tips & tricks) read the full article. (Jon Loyens @ TDS)

Daylight-saving weekend is ahead of us. Not a long time ago it was a big deal for me. I used to ride my bike early in the mornings and the beginning of the daylight saving took away an hour of light (and on the bike). These days my morning rides consist of 3km loop to the kindergarten and back. And having an extra hour of light in the evening sounds good to me.

In case you missed the last week’s issue of Keeping up with data

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.

--

--

Adam Votava

Data scientist | avid cyclist | amateur pianist (I'm sharing my personal opinion and experience, which should not to be considered professional advice)