Keeping up with data — Week 46 reading list

A curious mind wandering through the world of data

  1. Boost Your Team’s Data Literacy: Companies are lacking data-driven problem-solving skills like: asking the right questions; testing hypotheses using A/B tests; understanding which data is relevant; interpreting data well to draw useful and meaningful conclusions; telling a story to help decision-makers see the big picture and act on the results of analysis. These ‘soft skills’ make a difference. The article’s suggestions are to (1) ensure people know how to use the tools; (2) set up a capability academy for data skills; (3) use examples and stories in awareness campaigns; and (4) bake data into all important decision making. (HBR)
  2. Rethinking the build vs buy approach to talent: Hiring new employees to keep up with the rapid pace of technological, digital and data development is very expensive, if not impossible. More organisations are taking a hybrid approach and combine hiring with training. But the L&D programmes aimed at developing new technical skills or data literacy need to look differently to the standard L&D solutions. They should be run by current practitioners and focus on projects and assignments tailored to company’s data, tools and tech stack; ‘on the job’ data training. These, together with senior leadership leading by example, are important for making a company truly data driven. (Josh Bersin)
  3. Models for integrating data science teams within organizations: Deploying data scientists in organisations is not easy. There are plenty of models, each with benefits and drawbacks, for example: centre of excellence, data scientists as consultants, data scientists hired directly by product teams, product data science models with data scientist in each product team but reporting into a central data science team. Each organisation is different but in my experience the product data science model works well. With more products and increasing headcount, the CDO needs to figure out the way to scale it that is right for the organisation. (Pardis Noorzad @ Medium)
  4. Safely Rolling Out ML Models To Production: Best practices for CI/CD of ML systems. In the CI phase, one needs to perform not only data and model validation but also test for production data assumptions and stress test the model’s operational performance. For the CD phase, shadow evaluation, A/B tests and multi-arm bandits are discussed. Cool, cool, cool. But this was the candy in the article: “While, the CI/CD paradigms address the “what” and the “how” of new models roll-out, the “when” is covered by the CT (Continuous Training) paradigm.” (Oren Razon @ towards data science)
  5. Bringing Personalized Search to Etsy: Etsy uses historical and contextual features to personalise user search results. Historical features are describing users’ shopping habits and behaviours. Contextual features use textual description (title, tags) and are capturing what items the user has interacted with in the context of all items (using e.g. Tf-Idf). When a user enters a search query, the algorithm selects 1000 most relevant items (ignoring the personalisation features) and consequently ranks them using the personalised historical and contextual features. It is a nice example of an ’80–20’ approach where you use a rough algorithm to quickly narrow down the list of possible solutions and then adopt a fine — more sophisticated — approach to accurately select the best solution from the pre-selected list. And a reminder that every improvement step is increasingly more demanding. (Etsy)



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Adam Votava

Data scientist | avid cyclist | amateur pianist (I'm sharing my personal opinion and experience, which should not to be considered professional advice)