Keeping Up With Data #77


Data world is complex and daunting not only for outsiders and newcomers. So many technologies, terms, concepts, architectures, approaches, tools, methods and buzzwords. Sometimes I’m enjoying the variety, sometimes it’s very distracting. I keep finding myself switching between two approaches — going wide and try to get a high-level overview of many topics, and digging deep into a few things I actually need on a regular basis.

This week’s list is more about the former.

  • Emerging Architectures for Modern Data Infrastructure: Team from a16z has updated their post on the architecture of a modern data infrastructure and blueprints for ML, BI, and multimodal data infrastructures. I’ve been coming back to the original article from 2020 frequently as it provides a nice overview of components of different data & analytics infrastructures and it’s also a great source of inspiration for technology choices for individual components. The core hasn’t change much (well, in less than two years). What has changed are the tools and applications around the core. This reflects the boom of so many new categories of the modern data stack (a.k.a. the Cambrian explosion). Only time will tell if these are to stay, evolve, or go. Anyway, it’s great that someone keeps an eye on all this and keeps updating the article. (Future)
  • The ghosts in the data stack: “Teams, organizations, and the analytics industry at large are haunted by implicit knowledge — knowledge that ‘exists within expert communities but is never written down’”. OLAP cubes are one of these haunting ghosts. So, what are OLAP cubes? It turns out that “OLAP cubes are just tables, but tables structured in a very particular way. Rather than a list of objects, OLAP cubes are a table of metrics, or ‘measures’, pre-aggregated across nested layers of groupings, or ‘dimensions.’” That makes working with OLAP cubes harder than working with the original tables, but it’s much faster. On the other hand, the aggregation obviously leads to some information being lost. The powerful modern databases seemed to make the OLAP cubes redundant. “But, like any good ghost, though they may not exist in the physical form, OLAP cubes are spiritually very much alive.” In BI tools. (benn.substack)
  • On Self-Service, Data Democratization and Language: JP’s articles has been labelled to be “as practical as blockchain”. But I do enjoy them anyway. This one is about self-service (defined as “when people don’t need other people to answer their own business-relevant questions”), data democratisation (“which is about creating an organisation that doesn’t tax curiosity, but encourages it.”). One way to achieve it might be through NLQ — the idea of ‘googling’ data is indeed very intriguing. (Modern Data Democracy)

What should have been a ‘Paris-Roubaix’ weekend will be an ‘Amstel Gold Race’ one due to French presidential elections. An inspiration to find some steep, short bergs around Lake Zurich during the weekend?

In case you missed the last week’s issue of Keeping up with data

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.




Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

General Ignorance & Other Predictions

Staying sharp as a FAANG Data Scientist

Coursera Capstone — Battle of the Neighborhoods

Verifying the Assumptions of Linear Regression in Python and R

Street View Modeling: What Does it Tell Us About the Future of Air Quality Monitoring?

Why You Should Learn Effective Communication in Data Science

Why Sweden could lose many lives?

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Votava

Adam Votava

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at

More from Medium

Keeping Up With Data #78

The Data Experience is broken

The 6 pillars of data maturity

Why Can’t You “Pull Data Real Quick”?