Keeping Up With Data — Week 41 Reading List

Source: https://towardsdatascience.com/linear-regression-explained-1b36f97b7572

What data articles do people find the most useful? When I asked this question in a LinkedIn poll I learned that my network has a very strong preference towards case studies from businesses. Only 14% respondents share my passion for data thought leadership articles. One would expect this week’s reading list will be full of case studies, right?

Wrong! Here are the articles that I read and enjoyed this week — one hands-on guide, three thought leadership pieces, and a scientific paper (probably the closest to a case study).

  • Great Expectations: Always Know What to Expect From Your Data: Building and deploying a model is just the beginning. We simply can’t assume that everything will stay the same and the performance of our model won’t deteriorate. The distribution of the target variable can change, so can the distribution of the features, or even the relationship between the features and the target. We can’t have great expectations for the data without validating them. And that’s when Great Expectations comes to rescue. (Khuyen Tran @ TDS)
  • The messy core of the Modern Data Stack: Unlike Petr, I’ve always been a fan of SQL, so it’s great to see it on a prominent spot among the data languages. But it’s not all roses and unicorns. SQL has a shallow learning curve and the modern data warehouse performance is very forgiving to sub-optimal queries. And that might easily lead to a very messy code. Solution? Analytics world should embrace software engineering mindset with its diligent design, planning, and refactoring habits. (Petr Janda)
  • How to become unemployed as a data leader: Putting data to work to create value for a company often involves innovating products (not necessarily data products). That’s why data leaders need to get out of the comfort zone of data and embrace the challenges of building solutions for humans. “You think that 97 page “report” is helping a leader make a decision because it’s chocked full of details and numbers? You think your or your team’s job is primarily to do math, statistics, or report building, mostly in isolation?” Well, think again. This won’t help you beat the 80% failure rate of data initiatives. You can forget about decent ROI. You might not even keep your job! (Designing for Analytics)
  • Top 5 challenges of data scientists: While theese challenges wouldn’t be in my top 5, they are very pertinent. It is not uncommon, to struggle with the first three — finding the data, getting access to it, and understanding it — for months when you are a data scientist who joining a medium-sized business. And years (at best) in case of large corporates. I’ve experienced it many times first-hand as a data scientist. And it feels like one of Dante’s levels of hell when this has to be sorted out when driving a data strategy execution for businesses embarking on a data journey. (Louise de Leyritz @ Castor App)
  • A Time-Series Analysis of my Girlfriends Mood Swings: “It’s tough to make predictions, especially about the future,” said a baseball-playing philosopher Yogi Berra. This paper has even higher ambitions — forecasting girlfriend’s mood swings. The paper examines various statistical and machine learning methods to improve on the state of the art ‘are you ok’ question. It provides a very rigorous approach to a critically important problem. I, for one, will be closely watching the future research in the space. While I don’t plan to buy a speedboat, I have high hopes for Dr. Broman’s work to help me find the best timing to launch my ‘new road bike’ project. (Journal of Astrological Big Data Ecology)

I’ve finished the Data Science for Business by Harvard Business School Online this week. While the technical level wasn’t very high — mostly like the image above — it was interesting to see a new approach to teaching data science principles to business people.

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.

--

--

--

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at DataDiligence.com

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Part 4: DevOps for AI/ML

Algorithmic trading based on mean-variance optimization in Python

How to Apply Transformers to Any Length of Text

Chat Images to Textual Conversation

Introduction to hierarchical time series forecasting — part II

K-means Clustering and Its real use-case in the Security Domain

Thoughts on Finding a Data Science Project

TOP 33 medium.com Articles, based on the number of links each article contains.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Votava

Adam Votava

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | Interim CDO at DataDiligence.com

More from Medium

Keeping Up With Data #58

Data Maturity Model — How to know if your organization is ready?

Key Learnings from the Next Generation of Analytics Practitioners

Is data really important?