Keeping Up With Data #82


Not sure exactly why, but I have read a lot of articles about data storage, data mesh, customer 360, immutable data warehouse and other concepts related to data management this week. I enjoy reading about these approaches, their pros and cons. Especially when it comes from real practitioners. Of course nothing is as good as you read it, but it imho helps to develop a pattern recognition for things that sound reasonable at first until they create a huge mess for the whole organisation.

Here is to developing a pattern recognition for data initiatives!

  • Experiments, Peeking, and Optimal Stopping: We live in the era of A/B tests. Evaluating an experiment requires a representative sample size. But the larger the sample, the more it costs (even just an opportunity cost of using a sub-optimal solution). But can we stop an experiment before reaching the pre-specified sample size? Without sacrificing accuracy? Sequential Probability Ratio Test is the first method to do exactly that. How it works? And what was the story behind its invention? Read this article combining great story with graphs and formulas. (Matteo Courthoud @ tds)
  • Is Data Mesh Fool’s Gold? Creating a Business-centric Data Strategy: “Many organisations are putting their data strategy success into the hands of data mesh,” says Bill Schmarzo. And he continues by explaining why a data strategy should be business-driven (not data-driven), and that relying on technology-lead solutions might provide a false hope and distract organisations from the heavy lifting of “collaborating with the business stakeholders to identify, validate, value, and prioritize the business and operational use cases where data and analytics can have a material impact.” My main takeaway from the article is: “Start your data strategy by understanding how your organization produces and measures value creation.” Not with data, or technology. (Data Science Central)
  • Is The Modern Data Warehouse Broken? Modern data warehouses allow companies to store large amounts of data and run almost any queries on top of the (often very complicated) data structures. But just because you can doesn’t mean you should. “The immutable data warehouse concept (also referred to as active ETL) holds that the warehouse should be a representation of the real world through the data.” And that’s the key! The real world businesses need to represent in their data infrastructures is usually way simpler than what they end up when loading data from dozens of source systems. I very much like this idea, which I call a ‘business data twin’. Of course it’s easier said than done. But it’s important to have some visions and principles, right? (Barr Moses @ tds)

Last weekend I did an amazing ride around Schwyz. Almost 3,500m of climbing in less than 4.5 hours. Well above my seasonal goal of 600 climbed meters per hour. Let’s see what this weekend brings!

In case you missed the last week’s issue of Keeping up with data

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.




Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

How To Make Sense of the “Data Market”

Tableau to the Rescue!

Volatility Regime Classification with GARCH(1,1)&Markov Models

Times and Dates in Python

Cost effectively scaling a Data Warehouse by decoupling compute from storage in Amazon Redshift

How to Spot Good Data Driven Businesses

How to think about probability

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Votava

Adam Votava

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at

More from Medium

The Modern Data Stack for Embedded Analytics

Building a Modern Data Platform — A Case Study

Best Metadata Platform in 2022?

ETL vs Interactive Queries: The Case for Both