Keeping up with data — Week 7 reading list

5 minutes for 5 hours’ worth of reading

Adam Votava
3 min readFeb 19, 2021
Source: https://pycaret.org

The opening article in this week’s reading list is a great reminder that even data infrastructure is about people. You can have the greatest infrastructure and tools but if users are not compelled that it’s solving their problems and if it’s not easy-to-use, it will always be bypassed and result in a patchwork of solutions.

This week’s list is — hopefully — ‘aggressively helpful’, fair and also a bit nostalgic.

  • Aggressively Helpful Platform Teams: Data scientists are building algorithmic solutions to business problems. But in order for the solutions to make an impact, they need to “wrestle complex, scalable infrastructure into submission”. Platform engineering teams are here to help with that. They are building solutions that help data scientists to focus on model development and not worry about the data infrastructure and technical side of model development and deployment. As with any product, even platform engineers should be very close to their customers — data scientists — understanding their needs, supporting the onboarding, fixing problems quickly and pro-actively reaching out with improvements. At Stitch Fix they call it being ‘aggressively helpful’ and it sounds like a great approach. (Stitch Fix)
  • Using the LinkedIn Fairness Toolkit in large-scale AI systems: Fairness is important for AI systems not to create a self-fulfilling prophecy, such as — for example — recommending active LinkedIn users by the ‘people you may know’ algorithm only for them to become even more dominant on the network. One also needs to pay attention to discriminatory features playing a role in the engine. LinkedIn Fairness Toolkit (LiFT) provides two key capabilities — measuring fairness on training data and fairness in model performance. That way you can test your data and models for equality of opportunity, equality of odds and predictive rate parity. (LinkedIn Engineering)
  • PyCaret 2.2: Efficient Pipelines for Model Development: This one is a bit of a nostalgic article for me. Many years ago, I read Max Kuhn’s book (still one of my all-time favourites on ML) and fell in love with his R package called Caret. It was amazing how you could use one framework to perform all the steps in the model development process without the need to stitch the individual packages and functions used in the backend together yourself. And today, I’ve learned about a similar package for Python. PyCaret uses the same principles as Caret. It provides a low-code machine learning toolkit using many popular libraries in the backend enabling you to quickly go from data preparation to model deployment. (Domino DataLab)

After a week of temperatures around -5°C here in Zurich, it’s getting a bit warmer. So hopefully, I’ll manage to go both skiing and cycling this weekend! I did last weekend too, but mountain-biking on the ice in the freezing temperatures isn’t that much fun so it counts as craziness, rather than cycling!

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.

--

--

Adam Votava

Data scientist | avid cyclist | amateur pianist (I'm sharing my personal opinion and experience, which should not to be considered professional advice)