Keeping Up With Data #82
5 minutes for 5 hours’ worth of reading
Not sure exactly why, but I have read a lot of articles about data storage, data mesh, customer 360, immutable data warehouse and other concepts related to data management this week. I enjoy reading about these approaches, their pros and cons. Especially when it comes from real practitioners. Of course nothing is as good as you read it, but it imho helps to develop a pattern recognition for things that sound reasonable at first until they create a huge mess for the whole organisation.
Here is to developing a pattern recognition for data initiatives!
- Experiments, Peeking, and Optimal Stopping: We live in the era of A/B tests. Evaluating an experiment requires a representative sample size. But the larger the sample, the more it costs (even just an opportunity cost of using a sub-optimal solution). But can we stop an experiment before reaching the pre-specified sample size? Without sacrificing accuracy? Sequential Probability Ratio Test is the first method to do exactly that. How it works? And what was the story behind its invention? Read this article combining great story with graphs and formulas. (Matteo Courthoud @ tds)
- Is Data Mesh Fool’s Gold? Creating a Business-centric Data Strategy: “Many organisations are putting their data strategy success into the hands of data mesh,” says Bill Schmarzo. And he continues by explaining why a data strategy should be business-driven (not data-driven), and that relying on technology-lead solutions might provide a false hope and distract organisations from the heavy lifting of “collaborating with the business stakeholders to identify, validate, value, and prioritize the business and operational use cases where data and analytics can have a material impact.” My main takeaway from the article is: “Start your data strategy by understanding how your organization produces and measures value creation.” Not with data, or technology. (Data Science Central)
- Is The Modern Data Warehouse Broken? Modern data warehouses allow companies to store large amounts of data and run almost any queries on top of the (often very complicated) data structures. But just because you can doesn’t mean you should. “The immutable data warehouse concept (also referred to as active ETL) holds that the warehouse should be a representation of the real world through the data.” And that’s the key! The real world businesses need to represent in their data infrastructures is usually way simpler than what they end up when loading data from dozens of source systems. I very much like this idea, which I call a ‘business data twin’. Of course it’s easier said than done. But it’s important to have some visions and principles, right? (Barr Moses @ tds)
Last weekend I did an amazing ride around Schwyz. Almost 3,500m of climbing in less than 4.5 hours. Well above my seasonal goal of 600 climbed meters per hour. Let’s see what this weekend brings!