You are probably here because you are thinking about entering the exciting field of machine learning. But on your road to mastery, you see a big roadblock that scares you. It is called math. Perhaps your last math class was in high school and you are from a non-technical background. Perhaps you have worked as a software developer before but never really needed that much math. Perhaps you have a technical degree but it has been a few years and your math is rusty.
Now you have lingering doubts whether your current math knowledge is enough to succeed in machine learning, how much of the dreaded subject you need to study, and how long it will take.
The mathematical foundations of machine learning consist of linear algebra, calculus, and statistics. Linear algebra is the most fundamental topic because data in machine learning is represented using matrices and vectors. Statistics are necessary to interpret results produced by learning algorithms and to understand data distributions. Calculus helps you understand how the learning process operates under the hood.
A committed self-starter with a high-school math background can master the mathematical foundations required for applied machine learning in linear algebra, statistics, and calculus within a period of 3 to 6 months if he or she dedicates 1-3 hours per day to studying.
Some people will certainly criticize these claims. Unfortunately, most resources on the internet are extremely vague. Some throw super-advanced concepts at you making you believe you need a math Ph.D. Others tell you to forget about the math completely.
People who are new to machine learning or are aiming to get started in the field need more helpful advice and some more definitive answers. That is what I want to do in this article. I want to clear up that confusion and give you some more precise answers based on several years of first-hand experience as a machine learning/deep learning engineer.
Buckle up, because in the following paragraphs we will address all of these questions and points in depth.
What Math do I Need for Machine Learning?
How much math you need depends primarily on the role that you aspire to. In recent years, a distinction has emerged between machine learning scientists and researchers on the one hand and machine learning engineers on the other hand. If you aim for a research position, you are looking at a heavier math curriculum than a machine learning engineer who wants to apply established algorithms to the business problems of his chosen industry.
Furthermore, some fields such as deep learning for computer vision or natural language processing require stronger math foundations than traditional machine learning engineering. I’ll focus on the data
Machine Learning Engineer
As a machine learning engineer, your job is to automate the process of making sense of your company’s data. Read that again! The focus here is on automating the process. The algorithms and models that help you make sense of the data and drive inference already exist. You don’t need to reinvent the wheel and code a neural network, or support vector machine from scratch. Instead, you use a package or a framework such as SciKitLearn or Tensorflow. Perhaps, you have a research scientist in your firm who has already trained and adjusted the models on your data.
In terms of math, here is the bare minimum:
- Basic Linear Algebra: You have to know what vectors and matrices are and how to perform basic operations on them such as addition, subtraction, and multiplication using dot products.
- Statistics and Probability: An introductory statistics course should suffice. You should understand the concept of a random variable, statistical independence, and conditional probability. Furthermore, you need to be able to calculate and interpret the mean, median variance, and standard deviation of a dataset. In terms of probability distributions, you need to know the normal or Gaussian distribution, and the Binomial distribution. Understand p-values and confidence intervals.
- Calculus: When building models using established frameworks, you don’t need calculus. Machine learning algorithms such as gradient descent use calculus to find optimum values. Frameworks such as TensorFlow take care of these details for you. I know this is a bold statement that many people will disagree with. Knowing partial derivatives is definitely useful to gain a better understanding of how many machine learning models work. But it is not a requirement to use frameworks. So if you want just the bare minimum to be able to get started with machine learning, calculus is not a prerequisite.
These skills in linear algebra and statistics constitute the minimum you should have to build a solid foundation in machine learning. For regression problems, it is a good idea to develop an understanding of residuals and the mean squared error. For classification problems, an understanding of non-linear activation functions such as the logistic sigmoid is helpful. If you want to read research papers, you also need to become familiar with mathematical notation.
Recap: As a machine learning engineer, your job is to take an established model, make sure that it performs at scale on your data, that your data is clean and ready for the model, and that the whole process from data ingestion to the return of results is automated. The focus here is on engineering, not math and science. You need to know how to clean your data and how to build scalable systems.
The title data scientist is so broad and ill-defined that it is almost impossible to define requirements. In my opinion, the data scientist is someone who preprocesses and runs advanced statistical analysis on the data; and prototypes machine learning solutions. Accordingly, he has advanced statistics and math skills but lacks the software engineering skills of machine learning engineers.
The math requirement of a data scientist focuses heavily on statistics and probability.
- Linear Algebra: Matrices, vectors, and basic vector/matrix operations such as addition, subtraction, and dot products are the foundation of data science. I also suggest developing an understanding of eigenvectors, eigenvalues, singular value decomposition, and principal components analysis. These are techniques for dimensionality reduction and identifying the most relevant dimensions in a dataset. As a data scientist, you need to manipulate and transform data in a multidimensional vector space, so a knowledge of these techniques is helpful.
- Statistics and Probability: Statistics is where your focus as a data scientist will lie. In addition to the basic statistics skills that a machine learning engineer needs, you should be well acquainted with probability mass and probability density functions. Data scientists need to be deeply familiar with hypothesis testing, p-values and know when to apply various tests such as the z test, t-test, and chi-square test. Study probability distributions such as the Gaussian, Binomial, Poisson, student’s t, Chi-Square, and exponential and understand what scenarios can be described using them. Maximum likelihood estimation, analysis of variance, analysis of covariance, and linear regression techniques should also be part of your toolkit.
- Calculus: For understanding probability density functions, you should have a basic knowledge of estimating areas under the curve using integral calculus. A good data scientist also understands a little bit about mathematical optimization which will require the ability to find maxima and minima using differential calculus.
Recap: The job of the data scientist is to make sense of his company’s data using a toolkit of advanced statistical techniques and machine learning. In companies that have data scientists and machine learning engineers, the data scientist will often make sense of the data and prototype machine learning solutions. The machine learning engineer then takes the prototypes, readies them for production, and integrates them into a larger software solution. In many companies, this distinction doesn’t exist though and data scientists are expected to make their solutions production-ready.
Unfortunately, the term data scientist has become a corporate buzzword with the effect that many people across industries have rushed to adopt the title. Many of the self-proclaimed data scientists lack the qualifications described above and are unable to create advanced machine learning solutions. Those people are usually not data scientists but data analysts. This is a distinction to be aware of.
Deep Learning Engineer
A deep learning engineer is a machine learning engineer specialized in deep learning systems. Many of them will work on computer vision or natural language processing problems since these are the areas where deep learning excels. In my experience, deep learning engineers require a bit more math in their day-to-day work than classical machine learning engineers. I believe this is due to the fact that computer vision and natural language processing frameworks do not yet have the same kind of abstraction that traditional machine learning algorithms enjoy. A machine learning engineer who wants to perform regression or classification can in most cases rely on something like Google’s BigQuery ML. This takes away many of the low-level details. Even if he is building machine learning models directly in a language like Python, the modeling stage often only consists of calling a function from a package and adjusting a few hyperparameters.
If you build deep neural networks, you have to assemble the layers in a framework like TensorFlow. This requires that you are always aware of the data matrix dimensions and how they change in each layer. And while you don’t have to understand the intricate details of backpropagation and gradient descent, you should have a high-level understanding of the calculus that goes on under the hood and how it helps to adjust the weights in a neural network.
To minimize the loss in a neural network, you have to use a loss function that usually relies on more advanced statistical concepts such as maximum likelihood estimation and cross-entropy.
More advanced neural network architectures such as generative adversarial networks require a solid understanding of multivariate probability distributions, as well as concepts such as the Kullback Leibler divergence.
You should have a solid foundation in differential and integral calculus to be able to really make sense of these topics.
If you work in computer vision or natural language processing, you also need to know some of the classical techniques besides neural networks. In computer vision, for example, many of the foundational techniques are heavily reliant on vector geometry.
- Basic Linear Algebra: Just as the machine learning engineer, a deep learning engineer needs an understanding of matrices, vectors and matrix addition, subtraction, and multiplication. In addition, it might be a good idea to refresh trigonometry fundamentals such as sines, cosines, and tangents.
- Statistics and Probability: In addition to the basic statistics skills that a machine learning engineer needs, develop an understanding of probability density functions, maximum likelihood estimation, cross-entropy, and multivariate Gaussian distributions.
- Calculus: To understand backpropagation and gradient descent, I would recommend developing at least a high-level understanding of calculus for optimization. You don’t need to be able to solve complicated partial differential equations by hand but learn about the concept of rise over run, how partial differentiation works, and how it applies to vector calculus. Also, develop an understanding of first and second-order derivatives and how they can help to find the maxima and minima in a multidimensional vector space. To get a solid grasp of probability density functions, you should also learn the basics of integral calculus and how it helps you find the area under a curve.
Recap: As a deep learning engineer your responsibilities are very similar to those of a machine learning engineer. You don’t reinvent the wheel and create completely new types of neural networks. Your responsibility is to build deep learning systems based on established neural network architectures. But the modeling process with deep learning frameworks like TensorFlow usually requires a more in-depth understanding of neural networks and the underlying mathematics compared to fitting classical models in SciKitLearn or a higher level tool such as BigQuery ML.
Machine Learning/Deep Learning Research Scientist
The machine learning research scientist is at the forefront of researching and creating novel solutions in machine learning. Those are the people who typically work in universities and large tech companies, and who publish the latest breakthrough papers. Their requirement for math skills is very high and usually requires a Ph.D. level education in a quantitative subject such as mathematics or computer science.
Take the math requirements of the deep learning engineer and the data scientist and you have the bare minimum that would be expected of a research scientist. In addition research scientists often focus on a particular area such as non-convex optimization. If you are more interested, I suggest you check out the machine learning Ph.D. curriculum of a University like Carnegie Mellon or Stanford.
How to Learn Mathematics for Machine Learning
If you still have college ahead of you and want to go into machine learning, pick an undergraduate degree in applied math or computer science. This should give you a great grounding in all the math you need.
Chances are you are here because you’ve realized later in your career that you want to do machine learning. Perhaps you have a degree in computer science or another technical subject and have worked in a related field, but it has been a few years since graduation or you feel that you haven’t taken enough courses in math. Then you are in the same situation as I was.
Or perhaps you are coming from a completely different discipline and are wondering how you can learn the math on your own.
Here is how:
For self-study, you should definitely start with linear algebra. Almost all the other math topics in machine learning rely on vectors and matrices. If you don’t understand them, you are going to have a hard time following the other material. If you want to go the minimal math route, continue with basic statistics. Choose learning materials that do not rely on calculus such as Stanford’s introduction to statistics course on Coursera.
If you do want to get a stronger foundation in math (which I highly recommend), start with linear algebra, followed by differential and integral calculus. This should put you in great shape to tackle some more rigorous material in mathematical statistics.
You can learn all the mathematics you need by yourself using a combination of blogs, online courses, math books, and some Youtube videos. The difficult thing is just to know which resources to use and what topics to study in what order. Let’s solve this problem and give you a roadmap for self-study.
Blogs and Youtube
I’ve written a series of almost 60 posts on this blog about the foundational mathematics behind machine learning. You can check it out here. If you go through the posts in order, you have a free roadmap to guide you through the math you need to learn.
For almost every topic covered in the blog post series, you can head over to Youtube and find videos that explain everything visually. Khan Academy and 3Blue1Brown are two of my favorite math Youtube channels that offer an exceptionally high quality of teaching. Using the blog and the supplemental Youtube videos, you can build machine learning math foundations completely free of charge. You can also use the blog post series as a reference if you want a conceptual understanding of specific mathematical concepts.
To really solidify your understanding, you need to practice using exercises. This is where online courses become really helpful. There are three providers of online courses I want to mention: Coursera, MIT Open courseware, and Udemy.
Imperial college London has a great mathematics for machine learning specialization on Coursera. They don’t cover probability and statistics, though. For a basic grounding in statistics without calculus, I recommend Stanford University’s Introduction to Statistics. For a more advanced treatment of mathematical statistics, check out John Hopkins’ Advanced Statistics for Data Science specialization.
The Coursera courses work well as standalone learning resources. The quality is great and lectures are supplemented with exercises. It does make sense to use Khan Academy or the math for machine learning series on this blog to explore some topics more in-depth.
MIT Open Courseware
On MIT OC I recommend checking out Glibert Strang’s Linear Algebra course. Strang is one of the leading figures in the field and he does a superb job at explaining complicated concepts. MIT OC also has courses on Multivariable calculus and Statistics. While the instructors are top-notch (those are based on real courses at MIT), the courses often lack enough material to be used as standalone resources.
Udemy has some math for machine learning courses as well as courses on linear algebra, statistics, and calculus. Since anyone can provide courses on Udemy, the choice between courses and curricula is wider than on other platforms. But usually, these courses cannot compete in terms of quality with MIT or Coursera and lack hands-on exercises.
This is a relatively cost-effective way of learning since the price of these resources is much lower than getting a University education.
If you really want to go in-depth, academic textbooks are the ultimate resource. With a few exceptions, they are also more difficult to understand than the material presented in courses, on blogs, and on Youtube. The best and most comprehensive books for studying the math behind machine learning in my opinion are:
- Introduction to Linear Algebra by Gilbert Strang
- Calculus Made Easy by Silvanus Thompson
- Introduction to Mathematical Statistics by Hogg, McKean, and Craig
- Mathematics for Machine Learning by Deisenroth, Faisal and Ong
I’ve relied heavily on each of them to produce most of the math content on this blog.
I claimed in the intro paragraphs that you can learn the mathematical foundations for applied machine learning/deep learning within 3-6 months. With the resources mentioned above, I’m sure this is doable for most people who are willing to consistently put in 1-2 hours every day.
Note that I’ve excluded research here. In research, you’ll probably need a Ph.D. and the mathematical knowledge that comes with it. Obviously, a Ph.D. cannot be acquired in 3-6 months.