Keeping Up With Data #75

5 minutes for 5 hours’ worth of reading

Image by Liz Fosslien | Source: https://www.linkedin.com/posts/liz-fosslien_licreatoraccelerator-activity-6911526838167097344-yP0l/
  • Data Contracts — ensure robustness in your data mesh architecture: Data contracts provide clarity to the problem of data ownership in the federated architectures. As “they provide information on what data products are being consumed, by whom and for what purpose [… they] are essential for robust data management!” How exactly do they work? And what is the role of data sharing agreements? Read Piethein’s article with the answers and practical examples. (Piethein Strengholt @ TDS)
  • Stop using random forest feature importances. Take this intuitive approach instead: Scikit-learn’s implementation of random forest and other tree-based algorithms provides the view into the importance of individual features. However, “tree based models have a strong tendency to overestimate the importance of continuous numerical or high cardinality categorical features.” And that can easily lead to wrong conclusions or decisions. The proposed alternative is to use Permutation Feature Importance. Or SHAP. (Ali Soleymani @ Medium)
  • How Should We Be Thinking about Data Lineage? While business reality of most of the organisations is fairly straightforward, its reflection in their data not so much. Which is why data lineage — both technical and business — is becoming a big topic (not only) in data governance. Data lineage can help with data trouble shooting, impact analysis, discovery and trust, but also data valuation. For the three steps to implement data lineage (and related tips & tricks) read the full article. (Jon Loyens @ TDS)

In case you missed the last week’s issue of Keeping up with data

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Votava

Data scientist | avid cyclist | amateur pianist (I'm sharing my personal opinion and experience, which should not to be considered professional advice)