Keeping Up With Data #70
5 minutes for 5 hours’ worth of reading
The Big Mac Index is an informal way of measuring purchasing power parity. It obviously has many limitations, but it is a useful number as it seeks to make the exchange-rate theory a bit more digestible. And it is also a nice example of a feature — a number derived from other numbers aiming to capture the essence of a complex reality.
Performance testing, red flags, and data governance. Scary topics in this week’s reading list.
- Fixing Performance Regressions Before they Happen: A blog about software performance, which I read as a case in point for data-powered solutions beating expertly designed rules. Software engineers at Netflix are running memory and responsiveness tests for each pull request. Initially, static thresholds were used. Setting thresholds manually is not only labour intensive, but has other limitations such as ignoring the context. And then they pivoted to anomaly and changepoint detection. While some might struggle to consider these approaches to be advanced analytics, they provided an incredible value. Because the process has been automated, the number of tests grew significantly. But most importantly, 45% of the alerts were true regressions (compared to <1% in the case of manually curated static thresholds). That’s what I call working smart! (Netflix TechBlog)
- Red Flags to Look Out for When Joining a Data Team: Though it might seem that the companies are the decision maker in the recruiting process, the candidates are also calling the shots. And while it’s difficult when we desperately want to land a job in data science, we should stay alert for any red flags. I like the questions Eugene is sharing, though many of them are not red flags to me. But that’s ok. It only shows that data science is broad and every data scientist have different preferences and expectations from the job. Some of us want to do ML when everything else is ready, some of us want to get the foundations in place. Some want to have clear roadmap in place, some want to create it. Some don’t want to work under an incompetent manager, some see it as an opportunity. But whoever you are, watch out for your red flags. (Eugene Yan)
- Some thoughts on Data Governance: Data governance is indeed an important topic. Who wouldn’t want high data quality and observability, right? But I agree with Mohammad that the current approach is strange at best. My experience is that data governance is often treated as if the subjects were computers (doing as instructed by a code), not humans. And then we have massive data governance projects aiming to clean up the mess created by complex IT systems not designed with data describing business in mind (and worse of all, often by the same companies who designed the systems in such a complicated way). I believe that when data is treated as a mirror of the (business) reality, it provides guidance for both data creation as well as how to use data for business benefits. If data is a byproduct of operational software, no data mapping and documentation exercises, or policies will save you. I’m convinced data governance is more about people than technology. And people should be compelled, not ruled. (Mohammad Syed)
I’m driving to Prague later today. We finally got the passport for our youngest son so it’s time to introduce him to the family. It only took 5 month, two official translations, countless official forms and 1,200 km in a car driving four times to the embassy. Czech e-government in practice. Yay!