Keeping Up With Data #94

5 minutes for 5 hours’ worth of reading

Adam Votava
3 min readAug 5, 2022

I’ve spent most of the week hiking in the Alps with my family. It’s been absolutely fantastic — limited access to electricity forced me to put my phone into flight mode and dedicate my undivided attention to my kids and my wife. And the nature around St. Moritz and Bernina Pass is so beautiful.

Anyway, it also means I haven’t had much time for reading and only managed to read three articles below in the train when the kids felt asleep.

The first one is very important. It shows data valuation as an economic, not accounting exercise. Assigning value to data based on the use cases delivered (or planned or even only possible yet) is a very powerful approach.

But be aware of what’s meant by a use case. It’s not a simple dashboard or a machine learning model. Use Case is a cluster of Decisions around common KPIs or metrics that seek to deliver a well-defined business or operational outcome in support of an organization’s key business initiative.” says Bill Schmarzo in one of his articles about building value-driven data strategy.

  • What’s the Value of my Data? Today’s Most Critical Yet Hard to Answer Question: Bill Schmarzo is a proponent of economic value of data, so it’s no surprise that his suggested data valuation methodology is centred around the value of data use cases. Getting to the value of a use case (think reducing inventory costs, or improve vendor product reliability) is an exercise for financial analysts. One then needs to attribute the value of individual use cases to data sources and analytic modules required. Straightforward, but very powerful. (Data Science Central)
  • Your Opinion Matters: Leveraging customer feedback provided as free text is very important yet incredibly challenging. Once again, we see the power of having a large expertly curated training data samples. With the data in place, one can leverage publicly available state-of-the-art NLP models like GPT-3. That’s exactly what the team at Stitch Fix did. I believe this is the way many businesses should follow. Create their own data sets and leverage what’s available when it comes to technology and machine learning. (MultiThreaded)
  • σ-driven project management: when is the optimal time to give up? The blog is built on the following premise: software projects completion time follow a log-normal distribution, which means the logarithm of the ratio of estimated vs. actual will follow a normal distribution with zero mean and standard deviation of σ. The σ depends on the complexity of the project. The less uncertainty in the project, the lower the sigma and the more accurate the estimates. The real question is not how bad the estimated vs. actual ratio will get, it’s when to drop a project. Erik argues it’s when the marginal ROI (the success ratio per time spent) of the project drops below its initial marginal ROI. Anyway, break your big projects to smaller ones — they are easier to estimate and manage, and they get done more likely. (Erik Bernhardsson)

I’m doing a little triathlon tomorrow. Hope the body will remember how to swim. It’s been a while. The rest should be ok — nice ride and run along the lake Zurich.

In case you missed the last week’s issue of Keeping up with data

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.



Adam Votava

Data scientist | avid cyclist | amateur pianist (I'm sharing my personal opinion and experience, which should not to be considered professional advice)