Keeping up with data — Week 49 reading list

5 minutes for 5 hours’ worth of reading

Source: https://anvaka.github.io/vs/?query=PyTorch

he winter has arrived in the mountains around Zurich. People don’t talk about weather here as much as they did in Japan, where every conversation at this time of year started with 寒いですね. But cold it is indeed.

Since Monday, the internet has been full of news of DeepMind’s AI making gigantic leap in solving protein structures. Their AI — AlphaFold can predict the shape of proteins to within the width of an atom. A breakthrough that could accelerate drug discovery.

The majority of this week’s list is about analytics, joined by a ‘business’ piece and a really cool trick. So, let’s get into it!

  • Modeling libraries don’t matter: Which modelling libraries to use in the ML stack? A question often asked when creating first ML pipelines. The article argues that it doesn’t matter. Firstly, because actual modelling is marginal compared to other parts, like data pipelines. And secondly, because rewriting modelling code from one framework to another is actually not a big deal. So, don’t overthink the choice between PyTorch or Tensorflow. (Shreya Shankar)
  • You might not need machine learning: One of those seemingly innocent articles that leads to a long discussion on Hacker News about ML, AI, NNs. My take is that in business situations we should aim for the simplest solution to getting the job done. And when you want to learn new techniques or just play, don’t let the dogmatism and fanaticism of some data experts take the fun away and do whatever you want to. Peace. ✌️(null program)
  • 2020’s Top AI & Machine Learning Research Papers: Executive summary of selected papers with a summary of the ideas, opinions of AI community plus links to implementation codes. From an award-wining paper on earthquake predictions, through to $1.4M chatbot Meena, to AdaBelief optimiser combining adaptive optimization methods and accelerated SGD optimisers. (TOPBOTS)
  • Why Chief Data Officers Must Assume Leadership for Data Success: While many companies are struggling to find a way to manage data as a business asset, there is a need for CDOs to step up and become senior leaders guiding companies on their data journey. The article has four advices for CDOs: (i) develop and execute data strategy mirroring business strategy; (ii) communicate the context, complexity, and value of data; (iii) make the data trustworthy; and (iv) create a compelling career path for data leadership. (MIT Sloan Management Review)
  • The Google ‘vs’ Trick: Using ‘vs’ suffix is a great trick when looking for alternatives because Google’s suggestions are targeted at comparisons. This is especially useful for algorithms (“bert vs”), technology (“parquet vs”) or technology (“mysql vs”). Doing that iteratively creates a nice map of answers. (David Foster @ Medium); see also source/target and definitely check out this interactive version by Andrei Kashcha!

Holiday season is here, and it will bring a pile of 2020 reviews and ‘best of’s’ and 2021 prognoses to read. I can’t wait to critique them for you!

And, if you are too old for a chocolate advent calendar, try Advent of Code.

Thanks for reading!

Please feel free to share your thoughts or reading tips in the comments.

Follow me on Medium, LinkedIn and Twitter.

Data scientist with corporate, consulting and start-up experience | avid cyclist | amateur pianist | CEO & co-founder at DataDiligence.com