Keeping Up With Data #75

Image by Liz Fosslien | Source: https://www.linkedin.com/posts/liz-fosslien_licreatoraccelerator-activity-6911526838167097344-yP0l/

Are the future data tools code-first, low-code or no-code? As ever, there are supporters of all these options. There are advocates (often providers) of no-code or low-code solutions as well as opinions against the no-code solutions or against limitations of the current no-code solutions. And there are also voices rooting for co-existence of both. I like that. Having options is good.

The picture by Liz Fosslien above made me smile. Having designed few data strategies, I can certainly relate.

Data contracts, data lineage, and feature importance. Rather technical reads today.

  • Data Contracts — ensure robustness in your data mesh architecture: Data contracts provide clarity to the problem of data ownership in the federated architectures. As “they provide information on what data products are being consumed, by whom and for what purpose [… they] are essential for robust data management!” How exactly do they work? And what is the role of data sharing agreements? Read Piethein’s article with the answers and practical examples. (Piethein Strengholt @ TDS)
  • Stop using random forest feature importances. Take this intuitive approach instead: Scikit-learn’s implementation of random forest and other tree-based algorithms provides the view into the importance of individual features. However, “tree based models have a strong tendency to overestimate the importance of continuous numerical or high cardinality categorical features.” And that can easily lead to wrong conclusions or decisions. The proposed alternative is to use Permutation Feature Importance. Or SHAP. (Ali Soleymani @ Medium)
  • How Should We Be Thinking about Data Lineage? While business reality of most of the organisations is fairly straightforward, its reflection in their data not so much. Which is why data lineage — both technical and business — is becoming a big topic (not only) in data governance. Data lineage can help with data trouble shooting, impact analysis, discovery and trust, but also data valuation. For the three steps to implement data lineage (and related tips & tricks) read the full article. (Jon Loyens @ TDS)

Daylight-saving weekend is ahead of us. Not a long time ago it was a big deal for me. I used to ride my bike early in the mornings and the beginning of the daylight saving took away an hour of light (and on the bike). These days my morning rides consist of 3km loop to the kindergarten and back. And having an extra hour of light in the evening sounds good to me.

In case you missed the last week’s issue of Keeping up with data

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.

--

--

--

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at DataDiligence.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Essence & Visualization of Confidence Interval

Rec Sytems for Beginners: Part 2

How Data Science is Helping Contain the Coronavirus Pandemic

Discover Your Next Favorite Restaurant — Exploration and Visualization on Yelp Dataset

Getting started with Apache Spark III

Creating a strategy using the Ichimoku indicator on Mudrex

Date and time in R

Level-up your Kaplan-Meier curves with Tableau

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Votava

Adam Votava

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at DataDiligence.com

More from Medium

Keeping Up With Data #74

Why grow in data maturity?

GPU Data Analytics Startups Are Gone

Big Data or Smart Data?