Keeping Up With Data — Week 39 Reading List

Source: https://www.andrewheiss.com/blog/2021/08/21/r2-euler/

What is the role of intuition in data science? Having studied general mathematics I learned that one can’t emphasise definitions and theorems over understanding. This might not always be obvious in real life. Knowing the ‘what’ and ‘how’ gets the job done most of the time. But without the ‘why’ we can easily get lost in more complex problems. I’d argue that a desire to become intimately familiar with various concepts by looking at them from different angles, visualising them, and practising them, should be in the DNA of any data scientist. Just like the image above can help us develop stronger intuition for R².

So, let’s get into this week’s reading list. Because the intuition isn’t going to build itself!

  • Is BI dead? ‘Original BI’ has been taken apart and most of the functions are now supported by separate tools in the modern data stack. Today’s BI doesn’t worry about data ingestion, storage, or transformation. What’s left is data consumption. But that hasn’t changed much since the inception of BI. BI is not dead. Endangered? Maybe. It needs to evolve to escape extinction. Benn argues that modern BI should focus on consumption only. It shouldn’t worry about using data in operational ways. Neither should come with bespoke data governance layers — they should be legless. However, BI should include all consumption — both self-serve consumption, as well as ad-hoc analyses. Currently it covers the first. The second is often done in SQL and the Python notebooks of data analysts. Doing both at one place, brings data professionals and businesspeople together. It helps build the notorious bridge between data and business. As Benn puts it: “So long as companies need dashboards and executives need reports to go spelunking through as they wait for the economy class passengers to board, we’ll need BI.” The question is: how will the BI of tomorrow look like? (Benn Stancil)
  • All statistical models are wrong. Are any useful? Statistical models are often powerful in explaining real-world phenomena. Not only those governed by natural laws. But also, for instance, to estimate probability of an outcome of experiments for a given population. To do that, we need to randomly select individuals from the population to take part in the experiment. Our model (e.g., logistic regression) then takes the data from the experiment and provides conclusions for the population. However, statistical models come with a set of assumptions that are not always validated. Expecting that the randomness used in the survey and experiment design covers the randomness of natural world is — let’s say — naive. But such is the convention in scientific practice. The consequences are that the parameter estimates are often incorrect. But what’s worse, they can be so incorrect, that the true parameters are not even covered in 95% confident intervals. Should we worry? (arg min blog)
  • What is an A/B Test? Let’s stay with the topic of evaluating the odds ratio. What is the difference between the odds of an outcome (e.g., playing a movie) for two groups of randomly selected groups? How to design the experiment? And how to select the groups to be able to generalise the conclusion from the sampled groups to the whole population? And what should be the metrics used to measure the impact? The article aims at building an intuition behind tackling these questions. The next one on the table is how to evaluate the differences between the groups. With the previous article in mind, would you use logistic regression or simply calculate the odds ratio for the observed data? (Netflix Technology Blog)

That’s it for this week. So long and thanks for all the fish!

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.

--

--

--

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at DataDiligence.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Weekly Machine Learning Research Paper Reading List — #5

Hyrum’s Law from a Data Science perspective

Linear Regression and Multiple Linear Regression Modelling Problems

Little Data Is Waiting for Its Parents to Pick It Up

The Most Bang for Your Buck

The Engineers Guide to Machine Learning: Data processing | Data Types

Get certified as AWS Machine Learning — Specialty

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Votava

Adam Votava

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at DataDiligence.com

More from Medium

Keeping Up With Data #65

Key Learnings from the Next Generation of Analytics Practitioners

Thoughts on Big Data

HOW TO START YOUR DATA & ANALYTICS STRATEGIES