And how you can start
According to , Data Science is one of the most sought after jobs on the job market. But is this still the case? Or is there already a more desirable one?
There is! Machine learning engineering is overtaking data science in the job market.
In this article, I want to shed light on why machine learning engineering is overtaking data science in my opinion and how you can start learning it.
But let’s first start with understanding the difference between both job roles.
The quote from a Snowflake article summarizes the differences quite well :
Machine learning engineers are further down the line than data scientists within the same project or company. A data scientist, quite simply, will analyze data and glean insights from the data. A machine learning engineer will focus on writing code and deploying machine learning products.
We can also take a look into the lifecycle of a data science project to understand the differences better:
So basically, a data scientist develops a model, trains and evaluates it. The machine learning engineer then takes that model, deploys it into production and ensures that the model is maintained. So the machine learning engineer puts the trained model into a product so that revenue can be generated from the model.
But aren’t both jobs equally important? Yes they are. But data scientists were already hired massively from companies, as they were mostly in the modeling and exploration phase. And machine learning engineers are now heavily required, as the companies now need to put these models into production for creating value out of them.
According to an article of Venture Beat , “87% of data science projects never make it into production”. And this is due to the lack of hired machine learning engineers that know how to put models into production. This mismatch clearly shows that companies are now focusing more (at least they should) on hiring machine learning engineers, being able to put the models into production.
We can also see the difference when checking the open job postings on Glassdoor. For California in the US, there are currently 1809 data scientist job postings in comparison to 3345 machine learning engineer job postings. So there are almost twice as many open positions for machine learning engineers!
But why can’t data scientist not simply also learn how to put models into production? Because the data scientist is focused on ML code, which typically is only a very small portion of the complete ML infrastructure (Figure 2). And the data scientist should also only focus on that small portion. It would simply be too complex to focus on the ML code and the infrastructure for deployment, monitoring, …
It is therefore important to have a data scientist and a machine learning engineer in your team to create the best value out of your data.
Okay, so now we know that machine learning engineers are currently more in demand on the labor market. But what skills are required for being a machine learning engineer? What do you need to learn to become a machine learning engineer?
In this section, I want to focus on the required skills for becoming a machine learning engineer and the probably best tools to learn. On top of that, I want to provide you links to online courses that I have taken on my journey to becoming a machine learning engineer.
DISCLAIMER: I only provide links to courses that I have participated in myself. The links I provide are not affiliate links, so I don’t get any money from sharing them. I just want to share them with you because they have really helped me on my learning journey!
Most valuable Skills
So, according to an article from Udacity , these are the most valuable skills for becoming a machine learning engineer:
- Computer Science Fundamentals and Programming: data structures (stacks, queues, …), algorithms (searching, sorting, …), computability and complexity and computer architecture (memory, cache, bandwidth, …)
- Probability and Statistics: probability, Bayes rule, statistical measures (median, mean, variance, …), distributions (uniform, normal, binomial, …) and analysis methods (ANOVA, hypothesis testing, …)
- Data Modeling and Evaluation: finding useful patterns (correlations, clusters, …) and predicting properties of unseen data points (classification, regression, anomaly detection, …), continuously evaluating model performance with correct performance metric (accuracy, f1-score, …)
- Applying Machine Learning Algorithms and Libraries: choosing correct model for underlying problem (decision tree, nearest neighbor, neural network, ensemble of multiple models, …), learning procedure to train model (linear regression, gradient boosting, …), understand influence of hyperparameters, experience with different ML libraries (Tensorflow, Scikit-learn, PyTorch, …)
- Software Engineering and System Design: understand different system components (REST APIs, databases, queries, …), build interfaces for ML component
Tools to Learn
Now let’s move on to the tools that I think are essential to learn:
- Python: I think this one is clear. Python is still the number one programming language in the field of machine learning , and it is also easy to learn.
- Linux: As a machine learning engineer will work a lot with infrastructure topics, being able to work on Linux is really important.
- Cloud: More and more applications are moving to the cloud. That means that you as a machine learning engineer will probably also deploy the models to a cloud environment. Therefore, I recommend learning to work with at least one of the popular cloud providers (GCP, Azure, AWS). I am currently enrolled in the AWS developer certificate course on Udemy that I can really recommend!
- Docker, Kubernetes: In my opinion, these two tools are a must learn for every machine learning engineer! They are so powerful for easily deploying models into production and creating complete architectures for your applications. I took the Docker and Kubernetes complete guide on Udemy and learned a lot throughout this course!
Other Useful Online Courses
So now that you know what skills are required and what tools to learn, I also want to show you some other helpful online courses that I think can help you on your journey to becoming a machine learning engineer (at least they helped me):
- Deep Learning Specialization by Andrew Ng: This course focuses on Deep Learning and how to train models in the field of image classification and many more. Andrew is great in explaining the theory. But you are also directly applying the theory in hands on lessons, which is great in terms of the skills needed to apply machine learning algorithms and libraries.
- Machine Learning Nanodegree by Udacity: This so called Nanodegree of Udacity focuses on training ML models and putting them into production, mainly using AWS SageMaker and more. You can also check out my Medium article where I write about the Capstone project that I did for passing this course. NOTE: Udacity replaced my course with a newer version of that course. But I think this new version still makes a lot of sense to participate in.
- IBM Machine Learning Professional Certificate: This course on Coursera focuses on every aspect of machine learning, with a lot of hands-on. You will learn about supervised and unsupervised machine learning, deep learning, reinforcement learning and many more. At the end of each course you have to build your own Capstone project where you also have to create a report describing your application and so on.
You have now learned that becoming a machine learning engineer is more desirable than becoming a data scientist. You also now know the skills and tools you need to learn to become a machine learning engineer.
Therefore: Go and get your hands dirty! Learn these tools, take some online courses, and land your first machine learning engineering job.
Just one more thing that I want to say: Always get your hands dirty! Make as many hands-on ML projects as you can. And don’t forget to take your trained models and put them into production, as you want to become a machine learning engineer.
You can also read my articles about a Deep Learning project, where I trained an ML model and put that into production.
In this article, I explain the underlying problem and how I trained the ML model. I then package the trained model into a Docker container and create an easy webpage using Flask.
In this article, I then deploy the Flask application into AWS so that everyone could access my application.
Thank you for reading my article to the end! I hope you enjoyed this article. If you want to read more articles like this in the future, follow me to stay updated.
 Thomas H. Davenport and DJ Patil, Data Scientist: The Sexiest Job of the 21st Century (2012), Harvard Business Review
 Snowflake, MACHINE LEARNING ENGINEER VS. DATA SCIENTIST
 Sundeep Teki, ML Engineer vs. Data Scientist (2022), Neptune AI Blog
 VB Staff, Why do 87% of data science projects never make it into production? (2019), VentureBeat
 Rashid Kazmi, Machine Learning in Production (MLOps) (2022), Towards Data Science
 Arpan Chakraborty, 5 Skills You Need to Become a Machine Learning Engineer (2016), Udacity
 Sakshi Gupta, What Is the Best Language for Machine Learning? (2021), Springboard
Data scientists will still be needed by many companies, to solve new or more complex problems. But once the hype is over, there will be less “data scientists” making the work of data analysts or reinventing the wheel for problems that can be solved easily with pre-made solutions.
Data scientists write higher-level code with Python or R, and they often use BI tools for data analysis and visualization. While ML engineers' job revolves around machine learning, machine learning is just one tool in a data scientist's toolbelt—they might go months relying instead on data analytics and statistics.
Because data science is a broad term for multiple disciplines, machine learning fits within data science. Machine learning uses various techniques, such as regression and supervised clustering. On the other hand, the data' in data science may or may not evolve from a machine or a mechanical process.
Machine learning engineers act as critical members of the data science team. Their tasks involve researching, building, and designing the artificial intelligence responsible for machine learning and maintaining and improving existing artificial intelligence systems.