Keeping up with data — Week 48 reading list
5 minutes for 5 hours’ worth of reading
Having recently started a new business, I wake up each Monday genuinely excited to see what the week may bring. Alas, this week began with my article being rejected by The Startup publication (but with useful feedback!). Yet, sometimes starting off on the wrong foot is a precursor to changing tides and leads to a successful week on all fronts.
I ran a LinkedIn poll using response options from How To Use Data Strategically In Business: 3 Essential Ways to explore with individuals in my network their experiences of creating business value through data. Thanks to all who voted! My next step is to write up the results together with some examples behind the votes, my own experience and opinions.
Mix of topics this week. Something to scroll through during Saturday morning coffee.
- Data Quality at Airbnb: Data quality is vitally important for all businesses. Airbnb is no exception. The article describes Airbnb’s Data Quality Initiative holistically from motivation through architecture to organisation and governance best practices. Why it stands out to me is that it demonstrates that a successful project should address business needs by covering data infrastructure, data analytics but also the people element. Making sure the organisation has the roles needed, supports the users and have clear and intuitive governance rules is key. (Jonathan Parks et al. @ Medium)
- AI’s next big leap: Recognising properties of objects, reasoning about them and asking questions about them is natural for people. But for AI it is a hard problem to crack. Deep nets struggle to figure out even simple relations between objects unless they study hundreds of thousands of examples. Symbolic AI can achieve this by creating a knowledge base of symbols (like shapes or colours) but it doesn’t generalise when the knowledge is missing. A hybrid of these two — called neurosymbolic AI — is showing some promising results. (Knowable Magazine)
- Neural Architecture Search: There seems to be a lot of art in designing structures of neural networks. The most popular and successful model architectures have been designed by human experts. But can it be done automatically without falling into a complexity trap? How to define the search space, what search algorithm to use and how to evaluate it? The article offers a great overview of available methods for answering these questions. Generally, asking yourself: how could I automate typically leads to better understanding of the problem. (Lil’Log)
- What is a Feature Store?: Feature stores are becoming increasingly popular component of operational ML stack. Features are the main ingredients of ML models. The ambition of feature stores is to make the process of creating, storing, managing and serving features flawless and efficient. The article describes the feature stores as well as their main components. Ten years ago, I worked in a bank where we had standardised features available. And though it wasn’t as advanced as today’s feature stores it was a great productivity booster. (Tecton)
- How to Build a Production Grade Workflow with SQL Modelling: Large(r) organisations often need to run many data pipelines. Some of them require advanced analytics but some can be handled by SQL. For these straightforward pipelines, Shopify created an SQL modelling workflow leveraging dbt to provide their data scientists with a robust, production-ready framework. Their solution covers testing and governance and enables data scientists to work quickly and conveniently. Great example of using the right tool for the right job. (Shopify Engineering)
- Disposable Technology: A Concept Whose Time Has Come: Modern digital companies (like FAANG) have taken a technology architecture approach that increasingly treats the technology infrastructure as “disposable” using open-source technologies. It provides them with two advantages: (1) flexibility to move to the next best technology without architectural or vendor lock-in; and (2) ability to design and evolve the technology around business needs. Noteworthy reminder that technology should be enabling business, not hindering. (Data Science Central)
A friend recently pointed me to a term hardware lottery describing: “when a research idea wins because it is suited to the available software and hardware and not because the idea is superior to alternative research directions”, plus an article claiming that it’s not always bad and we should Ride the Hardware Lottery! because how can we say with today’s knowledge what research idea will lead to solutions to tomorrow’s problems.
Sometimes increasing complexity can lead to new interesting problems (see the image above). Sometimes curiosity about skills of ducklings pushes the state of the art of AI (see tip #2). Sometimes reverting to simple tools solves a business problem (see tip #5). But we should never forget that data, analytics and technology is subordinate to our personal and economical needs (see tip #6).
In case you missed the last week’s issue of Keeping up with data
Keeping up with data — Week 47 reading list
5 minutes for 5 hours worth of reading