Making Data Useful

Which flavor of data professional are you?

A field guide to the expanding data science universe

Cassie Kozyrkov
Towards Data Science
8 min readSep 6, 2019

--

The data universe is expanding rapidly — it’s time we started recognizing just how big this field is and that working in one part of it doesn’t automatically require us to be experts of all of it. Instead of expecting data people to be able to do all of it, let’s start asking one another, “Which kind are you?” Most importantly, it’s time we asked ourselves that same question.

Working in one part of the data universe doesn’t automatically require us to be experts of all of it.

Image: Source.

Disclaimer: I made these caricatures to help you get a mental map started, but we all know that real life doesn’t always color neatly within the lines. For example, one person might wear multiple hats or project phases might be blurred together by unplanned iteration. Please don’t throw rotten tomatoes at me for lack of nuance.

Note: my article on recommended hiring order for various roles is here.

Which business are you in?

Real-world applied ML/AI brings together every job role related to data, from statisticians to reliability engineers. Even if you’ve studied *all* the data things (yeah right), there are not enough hours in the day for one person to do everything alone, so let’s go on a quick safari of common roles in the data science ecosystem, explained with a kitchen analogy.

There are not enough hours in the day for one person to do everything alone.

If you translate your work into terms your friends from the food industry would feel at home with, which one of these is the best fit? (My original kitchen analogy article is here.)

Data Engineering: Source and process ingredients. This comes in two flavors: (1) supplying datasets for data scientists to use and (2) enabling data delivery at scale.

Grocery shopping is easy if you’re just cooking something for your own dinner, but when you increase the scale the trivial becomes intricate — how do you acquire, store, and process 20 tons of ice cream… without letting any of it melt? Scale makes it a sophisticated challenge. Similarly, data engineering is fairly easy when you’re downloading a little spreadsheet for your school project but dizzying when you’re handling data at petabyte scale.

Data Science Research: Invent new kinds of kitchen appliances. You can think of this as the theoretical research side of data science, and it’s what a PhD in ML/AI/Statistics/Optimization prepares you for. It’s about creating theory for expanding the kinds of problems that humankind can solve.

Researchers invent new things. Their career thrill is to prove that the previously-impossible is possible. They tend to enjoy themselves all the way up to the working prototype, and then they’re off chasing the next challenge. In terms of the kitchen analogy, they’re all about appliance blueprints. Perhaps they’ll turn those blueprints into a device that does the job, but expect it to be held together by hope and sticky tape… slam the door and it falls apart. As for the interface — sure, that sequence of buttons made sense in the researcher’s head, but if you try to use it, you’ll be pulling your hair out in frustration. User-friendly? Robust? Well-designed? Forget it! That’s someone else’s job. Your researchers are too busy figuring out how to get teleportation capabilities into the microwave. (When do you hire a researcher for your industry team? When you know you need teleportation capabilities and no one has invented teleportation yet.)

Data Toolmaking / Platform Engineering: Build user-friendly appliances and integrate them into delightful kitchens. This is about providing beautifully crafted tools and platforms for data scientists to use.

Researchers don’t usually build you the microwave that you’ll actually enjoy using. That’s where platform engineers come in. These folks don’t invent new kinds of wiring diagrams or blueprints, they make the existing ones available for mass consumption. The team that does this part of the work huddles together to nerd out on design thinking, reliability, and efficiency. They do user studies to ensure that they build tools you actually fall in love with. Unfortunately, you don’t see a lot of them entering a space until some poor suckers have suffered through demonstrating that the unfriendly version has a market. In the infancy of the microwave, there weren’t any idiot-proof versions mass produced and sold in household goods stores. AI has been in that kind of infancy for a long time, but now that its proving useful, the toolmakers are stepping up! This is an exciting time because a lot of what you think is hard about AI isn’t intellectual, it’s tool-quality-related. As the tools get more user friendly, more people will begin to enter AI and more creativity will flourish.

Decision Intelligence: Innovate in recipes and serve dishes. You can think of this as the applied side of data science. It’s about solving specific business problems with data and algorithms.

Members of decision intelligence teams work on innovating in recipes. They’ll bring an algorithms researcher on board if they’re out of luck with the existing algorithms they’ve tried and they’ll bring a platforms engineer to the team if they need friendly tools, but they’re also happy to outsource those functions. They’re more interested in cooking — and not just any kind of cooking. They’re a different kind of researcher, the kind who solves an impossible business problem by inventing an awesome tailor-made recipe. There are warehouses upon warehouses of kitchen appliances and ingredients, and their goal is to figure out how to get you that Michelin starred calorie-free pizza.

If you’re in the applied ML/AI space, let’s figure out which project phase you work in, then zoom in to find your role.

Project phases for applied ML/AI

In real life, there’s often iteration and backtracking involved, but here are the phases in broad strokes.

Ramp-up: Before we go play in a kitchen, let’s figure out what our goals are, assemble the team, and set our kitchen up. If we’re going to want access to microwaves, let’s go shopping for them. If we’re going to want teleporters, let’s try to invent them.

Prototype: Suppose our goal is to make a vegan calorie-free sausage that tastes like the real thing. How long will it take? Who knows! Your kitchen awaits — good luck!

Production: We have a recipe that meets requirements, let’s serve it all over the world for a decade…

Roles in the ramp-up phase

Decision-maker / product manager: What do we want to serve our customers and how good does it need to be? You call the shots and come up with the plan. Read more here.

Data engineer/architect: Get the trucks ready, set up warehouses, and figure out logistics for managing ingredients at scale. Read more here.

Researcher — We want a teleporter in our kitchen. Invent one. Read more here.

Toolmaker/Platform engineer — The researcher’s prototype teleporter is held together with sticky tape and hope. Build one that doesn’t fall apart when we slam the door. This category houses all the roles on a traditional software team, from designers to software engineers. Read more here.

The last three roles can be outsourced — to cloud providers, for example — if you don’t want to deal with things like kitchen setup or trucking.

Be sure you know exactly what you’re selling: Data? Algorithms? Tools that let others do AI? Solutions that happen to use AI? My advice is to focus on your core business and let someone else take care of the rest for you wherever possible.

Roles in the prototype phase

Analyst: The grocery store and kitchen are dark. We don’t know where to start. You’re the only one with a flashlight. Your job is to accelerate the project by helping your team discover and explore possibilities. Read more here.

Data engineer: Figure out how to fetch us 20,000 tons of frozen carrots; don’t let them defrost. Read more here.

ML/AI engineer: Go tinker in the kitchen until you’ve made calorie-free vegan sausage that tastes like the real thing. Read more here.

Statistician: Should we add this dish to the menu? It’s up to you to make sure the proposed recipe is good enough to meet requirements. Read more here.

Roles in the production phase

ML/AI Engineer: We have a fabulous prototype recipe. Adjust it and do the engineering that lets us serve it at scale. Lend a hand with maintenance when needed. Read more here.

Reliability engineer: Arrange production and build safety nets so we can serve our recipe reliably, even if there’s a shortage of raspberries. Read more here.

Analyst: Monitor production and sound the alarm if something seems to be going wrong. Read more here.

Statistician: Run live experiments to verify that users are happy and the recipe continues to meet requirements, especially if we’re thinking of approving an adjustment to the secret sauce. Read more here.

Roles for all project phases

Decision-maker / product manager: Guide us, dear leader! You call the shots. Read more here.

Qualitative expert / decision scientist / data translator: If the decision-maker knows nothing about food but is in charge anyway, you’ll be needed for translating between them and everyone else. Read more here.

Advisor / specialist: You’re someone the decision-maker goes to for advice on a specific topic such as ethics, UX, etc. Read more here.

Technical staff member: You’re someone who helps get things done, though your role is not explicitly on this list. That doesn’t mean we can do without you! All the standard roles in traditional software projects — from developers to project managers to people managers — have their place in the ML/AI kitchen.

To learn more about the ecosystem of related concepts, see my article Introduction to Decision Intelligence.

Thanks for reading! How about an AI course?

If you had fun here and you’re looking for an applied AI course designed to be fun for beginners and experts alike, here’s one I made for your amusement:

Enjoy the entire course playlist here: bit.ly/machinefriend

Liked the author? Connect with Cassie Kozyrkov

Let’s be friends! You can find me on Twitter, YouTube, Substack, and LinkedIn. Interested in having me speak at your event? Use this form to get in touch.

--

--

Chief Decision Scientist, Google. ❤️ Stats, ML/AI, data, puns, art, theatre, decision science. All views are my own. twitter.com/quaesita