Keeping Up With Data — Week 41 Reading List
What data articles do people find the most useful? When I asked this question in a LinkedIn poll I learned that my network has a very strong preference towards case studies from businesses. Only 14% respondents share my passion for data thought leadership articles. One would expect this week’s reading list will be full of case studies, right?
Wrong! Here are the articles that I read and enjoyed this week — one hands-on guide, three thought leadership pieces, and a scientific paper (probably the closest to a case study).
- Great Expectations: Always Know What to Expect From Your Data: Building and deploying a model is just the beginning. We simply can’t assume that everything will stay the same and the performance of our model won’t deteriorate. The distribution of the target variable can change, so can the distribution of the features, or even the relationship between the features and the target. We can’t have great expectations for the data without validating them. And that’s when Great Expectations comes to rescue. (Khuyen Tran @ TDS)
- The messy core of the Modern Data Stack: Unlike Petr, I’ve always been a fan of SQL, so it’s great to see it on a prominent spot among the data languages. But it’s not all roses and unicorns. SQL has a shallow learning curve and the modern data warehouse performance is very forgiving to sub-optimal queries. And that might easily lead to a very messy code. Solution? Analytics world should embrace software engineering mindset with its diligent design, planning, and refactoring habits. (Petr Janda)
- How to become unemployed as a data leader: Putting data to work to create value for a company often involves innovating products (not necessarily data products). That’s why data leaders need to get out of the comfort zone of data and embrace the challenges of building solutions for humans. “You think that 97 page “report” is helping a leader make a decision because it’s chocked full of details and numbers? You think your or your team’s job is primarily to do math, statistics, or report building, mostly in isolation?” Well, think again. This won’t help you beat the 80% failure rate of data initiatives. You can forget about decent ROI. You might not even keep your job! (Designing for Analytics)
- Top 5 challenges of data scientists: While theese challenges wouldn’t be in my top 5, they are very pertinent. It is not uncommon, to struggle with the first three — finding the data, getting access to it, and understanding it — for months when you are a data scientist who joining a medium-sized business. And years (at best) in case of large corporates. I’ve experienced it many times first-hand as a data scientist. And it feels like one of Dante’s levels of hell when this has to be sorted out when driving a data strategy execution for businesses embarking on a data journey. (Louise de Leyritz @ Castor App)
- A Time-Series Analysis of my Girlfriends Mood Swings: “It’s tough to make predictions, especially about the future,” said a baseball-playing philosopher Yogi Berra. This paper has even higher ambitions — forecasting girlfriend’s mood swings. The paper examines various statistical and machine learning methods to improve on the state of the art ‘are you ok’ question. It provides a very rigorous approach to a critically important problem. I, for one, will be closely watching the future research in the space. While I don’t plan to buy a speedboat, I have high hopes for Dr. Broman’s work to help me find the best timing to launch my ‘new road bike’ project. (Journal of Astrological Big Data Ecology)
I’ve finished the Data Science for Business by Harvard Business School Online this week. While the technical level wasn’t very high — mostly like the image above — it was interesting to see a new approach to teaching data science principles to business people.