Keeping Up With Data #77
5 minutes for 5 hours’ worth of reading
Data world is complex and daunting not only for outsiders and newcomers. So many technologies, terms, concepts, architectures, approaches, tools, methods and buzzwords. Sometimes I’m enjoying the variety, sometimes it’s very distracting. I keep finding myself switching between two approaches — going wide and try to get a high-level overview of many topics, and digging deep into a few things I actually need on a regular basis.
This week’s list is more about the former.
- Emerging Architectures for Modern Data Infrastructure: Team from a16z has updated their post on the architecture of a modern data infrastructure and blueprints for ML, BI, and multimodal data infrastructures. I’ve been coming back to the original article from 2020 frequently as it provides a nice overview of components of different data & analytics infrastructures and it’s also a great source of inspiration for technology choices for individual components. The core hasn’t change much (well, in less than two years). What has changed are the tools and applications around the core. This reflects the boom of so many new categories of the modern data stack (a.k.a. the Cambrian explosion). Only time will tell if these are to stay, evolve, or go. Anyway, it’s great that someone keeps an eye on all this and keeps updating the article. (Future)
- The ghosts in the data stack: “Teams, organizations, and the analytics industry at large are haunted by implicit knowledge — knowledge that ‘exists within expert communities but is never written down’”. OLAP cubes are one of these haunting ghosts. So, what are OLAP cubes? It turns out that “OLAP cubes are just tables, but tables structured in a very particular way. Rather than a list of objects, OLAP cubes are a table of metrics, or ‘measures’, pre-aggregated across nested layers of groupings, or ‘dimensions.’” That makes working with OLAP cubes harder than working with the original tables, but it’s much faster. On the other hand, the aggregation obviously leads to some information being lost. The powerful modern databases seemed to make the OLAP cubes redundant. “But, like any good ghost, though they may not exist in the physical form, OLAP cubes are spiritually very much alive.” In BI tools. (benn.substack)
- On Self-Service, Data Democratization and Language: JP’s articles has been labelled to be “as practical as blockchain”. But I do enjoy them anyway. This one is about self-service (defined as “when people don’t need other people to answer their own business-relevant questions”), data democratisation (“which is about creating an organisation that doesn’t tax curiosity, but encourages it.”). One way to achieve it might be through NLQ — the idea of ‘googling’ data is indeed very intriguing. (Modern Data Democracy)
What should have been a ‘Paris-Roubaix’ weekend will be an ‘Amstel Gold Race’ one due to French presidential elections. An inspiration to find some steep, short bergs around Lake Zurich during the weekend?