Keeping up with data — Week 6 reading list

5 minutes for 5 hours’ worth of reading

Source: star-history via datarevenue

ve read a lot about data infrastructure this week. Every time I dig a bit deeper, I’m amazed by the number of new tools. How can one keep up with all the development? And not even hands on, just knowing what’s out there?

Beyond that, an article about the importance of the CDO role and one about data literacy made it on the list today.

  • Making the business case for a chief data officer: Increasing number of companies are appointing a CDO. But the results-oriented executives often question the business value added by a CDO and — consequently — data. Having a strong C-level champion that focuses on delivering business value through data is the way forward. Such companies treat data as a strategic asset and work decidedly and with urgency to get the most out of it. Sitting on the side lines, incurring data-related costs and using a shotgun approach to drive data initiatives is not the way forward. (MIT Sloan)
  • How to build data literacy in your company: Data literacy is a surprisingly elusive terms. Everyone has a different understanding and it’s often use as an excuse for failing data initiatives. It is an imperative for all data professionals to be able to understand business problems, solve them with data and communicate — in a clear and compelling manner — them to the rest of the organisation. If we want employees to be more data literate (whatever we mean by that) we need to give them a strong reason why. And then support them on the way to better understanding. Like data professionals need the support of business experts to understand the problems they are tasked to solve. (MIT Sloan)
  • Airflow vs. Luigi vs. Argo vs. MLFlow vs. KubeFlow: Typically, companies often need to perform many data-related tasks: clean data, train a model, deploy the model in production, …. These tasks usually need to be run in sequence — which defines a pipeline. The pipelines can be simple and linear or very complicated and inter-linked. Even though one can theoretically use standard CI/CD tools to manage them, the dynamic and interconnected nature of data and ML pipelines has resulted in a lot of new tools to better perform the job. Not sure which tool to use to ensure centralised, repeatable, reproducible, and efficient workflows? Then check out the comparison of the most popular ones. (datarevenue)

The data literacy topic is indeed very interesting. It seems that data is becoming mainstream as it influences our lives more and more. So being comfortable with this new ‘tool’, understanding how it can be used and what are the pros and cons will be arguably make it to the school curriculums soon.

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | CEO & co-founder at DataDiligence.com