How do I learn mathematics for machine learning?
- Linear algebra / Matrix Algebra (See How do I learn linear algebra? and How do I learn matrix algebra?)
- Probability Theory (See How do I learn probability?)
If you're interested in an accessible introduction to matrix algebra, Coursera is running a course on it right now: Coding the Matrix: Linear Algebra through Computer Science Applications
The applied math most directly useful for machine learning is:
- Statistics (See How do I learn statistics for data science? What statistics book do you recommend to a wannabe data scientist who is familiar with basic statistics and mathematics?)
- Optimization (See How do I learn optimization?)
I discovered that the Matrix Cookbook was popular with most students for working with Matrix Calculus as it seems to have a never-ending list of matrix derivatives:
http://www2.imm.dtu.dk/pubdb/vie...
As far as brushing up on the rest of your Linear Algebra knowledge is concerned, I highly recommend Strang's lectures/book:
http://ocw.mit.edu/courses/mathe...
Highly relevant topics include knowing about rank and inversion, SVD, and also make sure you're very comfortable with eigenvalues and eigenvectors, amongst other things.
Finally, with Analysis, I don't think ML requires a formal introduction to Analysis at all. Its important to know higher dimensional calculus well, especially parts related to optimization, such as Lagrange multipliers, the primal-dual form, and in general, the calculus of Matrices, and you should be good to go.
Overall, I think the case with Linear Algebra and Calculus is to work your way through an ML book/course, and stop and look at the relevant math when necessary, whereas you need a strong foundation in Probability right from the beginning, and most textbooks on ML tend to talk a lot about probability while skimming over the mathematical details of LinAlg and Calculus.
To show you just how super-serious I am about this, I’m even going to separate this caveat from the rest of the answer with one of the ultra-cool line breaks.
Alright, at this point, I’m assuming that you are still solely considering graduate school preparation without an undergraduate education. Let’s go.
My background consists of an undergraduate BS in mathematics, a minor in physics, and a few years of research experience that has spanned from charged particle detectors (physics/EE) to autonomous vehicle system design for collision detection and evasion. Long story short: I’m far more qualified to answer your question when robotics is emphasized, so that’s what I’m going to do.
Robotics is Multi-Disciplinary
Robotics is a highly multi-disciplinary field. In fact, I’d argue that it could well be the academic field which encompasses the largest quantity of distinct domains into its core structure. When we’re talking about robotics, we’re really talking about
- Computer science
- Mathematics
- Computer engineering
- Electrical engineering
- Control engineering
- Systems engineering
- Mechanical engineering
- Physics (mechanics, more specifically)
What’s even more impressive about the above list than its size is the depth of each field. Aside from control and systems engineering, which are a bit more specialized and less fundamental than the others, each of the above domains are extremely broad—indicating that if you were to break down robotics concepts into a networked graph, it would resemble something like this:
[1]
Needless to say, roboticist ultimately specialize in a much narrower range so that expertise in a topic can be attained. But that doesn’t change the fact that to pursue robotics, high breadth and versatility in engineering and math is a tool whose utility can’t be overstated.
Specific Areas of Research
Now, regardless of whether you want to pursue a masters or a Ph.D., you will ultimately have to carve out a niche for yourself. As I mentioned above, mastery of all robotics is a hopelessly daunting task; it’s impossible. Therefore, it’s important that you expose yourself to the different areas of robotics, and gradually hone in on your desired path according to the topics with which you’re interested and at which you’re talented.
Here’s my breakdown of robotics research, in increasing order of mathematical abstraction and decreasing order of hands-on engineering and building:
- Sensors. About as applied and hands-on as you can get, the domain of sensors works on expanding the current technical constraints that robotics hardware faces. It’s because of these guys that the iphone magically gets smaller and smaller every year, while also increasing its technological capacities. An example of the importance of this domain which is even more specific to robotics is radar evading drones. Remember when Osama Bin Laden got taken out because we flew a helicopter in Afghanistan that magically evades radars? Thanks sensors.
- Nano-robotics. Focusing on developing robotic systems on the micro-level, nano-robotics explores how robotic agents can be built and implemented on a scale sufficiently small that they can be directly inserted into your body. Sound scary? It shouldn’t. Nano-robotics has a plethora of game-changing medical applications, some of which include legitimately curing cancer and preventing aging.
- Machine vision. While the ability to process and interpret visual information comes very intuitively to humans, translating our abilities to an algorithmic environment in this matter has proven to be an intimidating process. In fact, I’d argue that the largest obstacle facing self-driving cars is machine vision. Just take a look at the self-driving car expert at Tesla who died because his car failed to distinguish between the bright sky and an incoming white truck. [2]
- Robotic learning. When machine learning is applied in a robotic context, it basically becomes robotic learning. Robotic learning is the overlap between robotics and machine learning; it approaches the problem of developing tools for adaptation and learning in robotic systems. Very cool field, with a lot of promising application, and very well suited for someone interested in machine learning and robotics.
- Robotic control. This is the area in which I’m currently nested. Control represents a mathematical approach to modeling the behavior and evolution of a Dynamical system - Wikipedia in relation to inputs, which can be used to affect the system’s output. The goal here is to mathematically demonstrate that a certain approach for input selection guarantees that the system’s output will quickly converge to a stabilized desired range, as illustrated in this kick-a** picture. [3]
Because you have stated that robotics and machine learning are your interests, I’m going to assume your interests align with the #3–5 end of the spectrum. But even when your interests are honed in on these two areas, there is still a massive range of topics and skill sets spanned by these two very broad domains.
Developing Skills for Robotic Learning
Again, I’m far from an expert in robotic learning and machine learning, but I’ll do my best to show some helpful tips for pursuing this domain. The fundamental fields from which machine learning constantly draws, as I understand it, are the following:
- Probability
- Statistics
- Algorithms
- Optimization
- Systems
The last one is a bit more of a stretch in comparison to the others, but I’ve heard that a high portion of machine learning can actually be approached from a systems perspective, and that its inception actually arose from system theory modeling.
For probability and statistics, both intuition and rigorous technicality will be important. I had a horrible textbook which provided very little conceptual basis for the theorems, and mostly included a bunch of isolated problems which were crudely connected in a very disjointed way. I recommend Introduction to Probability by Grinstead and Snell, [4] which provides a lot of clear, well-articulated conceptual explanations which enhance both intuition and precise reasoning on the subject. It’s also free and available online, which ya’ know, is always a big plus.
Becoming comfortable with algorithms is a task which can more easily be achieved in a college setting, but one which is also very feasible to execute independently. Regarding a textbook to guide you through key concepts to algorithm theory, I recommend to look no further than the classic Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein. [5]
Additionally, I would look to two additional sources to continually expand algorithmic skills: Project Euler Archived Problems - Project Eulerand Topcoder Deliver Faster through Crowdsourcing. Project Euler encompasses a diverse range of mathematical problems for algorithmic development which will strengthen your mathematical algorithmic thinking, and your “out of the box,” creativity. Topcoder provides challenges which will improve your technical programming skills, and diversify and expand your problem solving breadth.
Of course, once you have a solid background in the above topics, you’ll want to receive a comprehensive introduction to robot learning, for which I’ve been told that Robot Learning by Connell and Mahadevan is a solid choice. [6]
Although robotic learning and robotic control are distinct domains, robotic learning is intrinsically tied in to concepts from control theory. In fact, one of the most challenging problems facing the robotic learning community is that it lacks the rigorous analysis and descriptions that the control and systems theories possess.
For example, a self-driving car that implements a series of clever robotic learning algorithms will never be implemented without tools from control systems. Why? Because without tools from control and systems theory, you will never get close to demonstrating rigorous, mathematically demanding qualities such as robustness, safety guarantees, stability, etc., without which, the government wouldn’t let your self-driving car see the light of day.
Robotic Control
I think that optimization, control, and systems are all presented and integrated very concisely in Design of Optimal Control Systems by Bini. [7] This book consists of more than a minimal amount of knowledge in any of these topics which is needed for machine learning. But a deep understanding of at least some of the ideas shown in this book will allow for insights to be drawn between these domains which most others will likely not be capable of seeing.
Note that I recommend the above for someone interested in both machine learning and robotic control. If you’re primarily interested in robotic control, then your mathematical skills need to be more sophisticated than the vast majority of other engineers. This is likely the only engineering discipline in which highly abstract mathematical fields play a fundamental role. They include
- Real analysis
- Systems of Differential Equations
- Dynamical Systems (similar to 2., but distinct from it)
- Advanced Linear Algebra
- Advanced Optimization
- Basic Topology
- Set Theory (more than the basics, but not quite “advanced” set theory)
Clearly, your mathematical skills have to be beyond the more applied end of the spectrum in which things like formalities, proofs, theorems, and rigor are almost never relevant.
For a comprehensive introduction to real analysis and topology that isn’t esoteric (difficult to find), I recommend Basic Analysis by Lebl. [8] While the book isn’t intended for studying topology specifically, it covers nearly all of the fundamentals which are relevant to control. Note that real analysis is the most important item in the above list.
Advanced Linear algebra is the most difficult field for which to find an accessible, engaging textbook, I.M.O.. The majority of the texts are far too focused on minute, irrelevant details and burdensome proofs whose understanding gains little insight regarding the deeper concepts. More importantly, most textbooks totally fail to connect the ideas to deeper concepts which are both cool and incredibly useful. After a lot of searching, I found hope in an unexpected place: online lecture notes. [9] If you master this book, and its difficult problems, to the point where you can comfortably walk through the main concepts with a high school student, then you’ll be five steps ahead of me.
As for dynamical systems, I’d say that Dynamical Systems by Sternberg does the trick. [10] Until you get to the more theoretical content like stability and invariance, you really want to focus more on the concepts; the details aren’t particularly important, surprisingly. You really just need to know what kind of assumptions you have to make about the system you’re modeling.
Once you’re comfortable with most of the above, you can get your hands dirty with some actual control theory. For this, I recommend Mathematical Control Theory by Sontag. [11]
†:
I have a hunch that’s not what you want to here, since you didn’t ask for advice regarding this matter. So I’m sorry if this caveat irks you in any way, but it’s the best advice I can give, and I think it’s important for you to hear.
I’m a firm believer in pragmatic optimism, and while it’s optimistic to believe that admittance into graduate school—especially in a technical field—is feasible without an undergraduate degree, it is far from optimistic. Without an undergraduate degree, you are immediately excluded from consideration for all departments at the majority of universities.
I can’t find any specific statistics on this matter, so you’ll have to choose whether or not to take my word for it. But trust me when I say that I can currently think of one graduate school that doesn’t necessitate an undergraduate degree as a strict requirement.
Even putting the strict requirements aside, for deeply embedded multidisciplinary fields like robotics and machine learning, an undergraduate education is crucial. Although I do think that the ability to interact with professors; learn with faculty and peers in person; and receive a curriculum designed by experts on which you are tested in a competitive environment are all vital assets for initiating the engineering experience in any field, they are especially true for robotics.
Another important distinction regarding your question is are you planning for a masters or a Ph.D.?
[1] Pawel Pralat: Graph Theory
[2] Tesla driver killed while using autopilot was watching Harry Potter, witness says
[3] Vehicle stability control systems: An overview of the integrated ...
[4]https://www.dartmouth.edu/~chanc...
[6] Robot Learning | J. H. Connell | Springer
[7] http://retis.sssup.it/~bini/math...
[8] http://www.jirka.org/ra/realanal...
[9] https://www.math.uh.edu/~climenh...
[10] Dynamical Systems (Dover Books on Mathematics): Shlomo Sternberg: 9780486477053: Amazon.com: Books
[11]http://www.mit.edu/~esontag/FTPD...
- Convex Optimization (Convex Optimization - Boyd and Vandenberghe)
- Linear algebra
- Some rudimentary Calculas (especially use of Lagrangian)
- Lots of Probability ad Statistics
Somdeb Sarkhel's answer to How much of the statistics is enough for a machine learning enthusiast and data scientist with no statistical background to come up with good ML models and also excel in Kaggle competitions?
http://courses.washington.edu/cs...
UPD:
Here the blog post at WebArchive's mirror is: http://web.archive.org/web/20101...
Bradford's lists at Amazon:
- Analysis [1]
- Linear Algebra [2]
- Probability [3]
- Statistics [4, 5]
- Optimization [6]
- Machine learning [7]
- Feature Selection [8]
I hope, Mr. Cross will be able to join the discussion.
[1] http://www.amazon.com/Analysis/l...
[2] http://www.amazon.com/Matrix-Fu/...
[3] http://www.amazon.com/Probabilit...
[4] http://www.amazon.com/Statistics...
[5] http://www.amazon.com/Nonparamet...
[6] http://www.amazon.com/Heuristic-...
[7] http://www.amazon.com/Machine-Le...
[8] http://www.amazon.com/Feature-Se...
UPD 2:
Here is the list of must-read books for theoretical machine learning [1], which is attributed to prof. Michael Jordan (UC Berkeley). The sources are [2] and [3].
[1] https://www.goodreads.com/review...
[2] Learning About Statistical Learning
[3] AMA: Michael I Jordan • /r/MachineLearning
This course materials are old by the way. Good news is that you can find the book (composition of all materials) easily by searching. If I am not wrong, the last revised version of this book is 6th May, 2012.
You need linear algebra as well. I recommend you for this reason, Gilbert Strang's "Linear Algebra and Its Applications". It may be little bit tough, but it is a great book.
If you want to dive into probabilistic approach, you can enroll Probabilistic Graphical Models course: https://www.coursera.org/course/pgm. I heard that it is very good course. Textbook of that course looks very useful: http://www.amazon.com/Probabilis...
The current machine learning (ML) algorithms are based upon mapping functions.
F
The function F
can be anything such as a support vector machine (SVM), a restricted Boltzmann machine (RBM), a deep neural network (DNN) or anything else that you can hand engineer yourself. In application areas, X represents the input space while Y
represents the output space.
In speech recognition X
might be a set of spectrograms while Y a set of identities representing the speakers. In image recognition, X is the raw image pixel space while Y is the categorization consisting of different classes in which xi∈X
can fall into.
Each ML model has parameters w
that affects the behavior of F
that we can normally adjust in order to change the behavior of that function. We can thus write the mapping more conveniently as:
y
,w)
where y
We will focus on supervised ML model where we have a dataset T
of training input-output pairs in the form:
T
The goal of supervised machine learning is to find the best parameter values w^
that makes the function F
map the input-output pairs with the least error. So in supervised ML we have two main issues:
- Define a fitness measure that tells us how well the ML model is performing on the trainging set T
- .
- Generalization: We can run the same fitness measure on the test set after training is complete in order to measure how well the model generalizes to novel inputs. This is a very important concept in modern ML.
- A learning algorithm to update the weights, w→w^
- .
This is where the maths come in, to understand the underlying maths concepts you need to understand what ML is trying to solve in the first place. The aim here is to find solutions to those 3 issues mentioned above and maths can help us with that.
1: A fitness measure:
This is normally done by an objective function also known as the loss/cost function:
L(y
where y
= actual output and y
= desired output.
In empirical risk minimization[1](ERM) the goal is to to minimize the overall loss as defined by the risk R
:
Remp(w)=1N
ERM states that the learning algorithm should choose the hypothesis function f
such that the empirical risk is minimized, In simple mathematical terms we need to solve:
w^=arg
Where f
2: Generalization:
The above naive ERM can result in the function f
just memorizing the training examples which can cause what is called overfitting, that is, fitting the function F to each and every noisy/outlier data point. That is not ideal thus instead we normally use structural risk minimization[2](SRM) whereby we add a regularization term C
to the risk, thus we get the regularized risk:
Rstru(w)=1N
Rstru(w)=Remp(w)+λC
Then in SRM we need to solve:
w^=arg
Regularization simply simplifies the weight parameters so that they don't model too much of outliers or noise. That is done by penalization of large weight values in w
which are a cause of most overfitting issues. Thus L0 norm can be used in order to favor a very sparse set of weights whereby most weight values are zero. You can also use L1 or L2 regularization instead as the L0
norm is hard to optimize. Other weird regularization methods have since popped up such as dropout, which is used in learning algorithms for DNNs whereby neurons are randomly dropped out and back during training so that the overall network becomes robust to noise, dropout can be loosely seen as an ensemble method.
3: A learning algorithm:
Learning in current ML can be viewed as a way to update the weights in order to find the optimal parameters. ERM and SRM both are relying on the existence of a learning algorithm for weight adjustment. We need an algorithm to find the weights that solve.
w^=arg
or
w^=arg
We need a way to update the model such that
w^←w
In current ML systems we just look to the old idea of gradient decent (GD) from numerical optimization. In GD we simply just move down the steepest slope on the error (risk) surface defined by the risk R
. That means we can just use the update rule defined by.
wt+1=wt−α∂
where t
=step count, α
=learning rate
Here we assume a convex surface defined by R
but in practice especially for DNNs the surface is highly non-convex but in practice almost any local minima is just good enough, plus we can add momentum to the update rule so that it can escape from the local minima traps easily. Also the shear number of parameters makes it harder for the DNN model to get trapped in a local minima trap as there are many possible escape routes through the other many dimensions.
In DNNs the gradient computations can become cumbersome even for a modern machine as the number of gradient steps needed to hit w^
are normally large. Thus we need fast ways to accelerate gradient computations for layered architectures. Backpropagation (backprop) algorithm, to be specific, is a way of computing gradients extremely efficiently in any differentiable computational graph. Backprop uses chain rule by starting from the output layer which is directly connected to the loss function and hence easier to evaluate the derivatives and then move towards the layers (input layer) far away from the output layer while chaining the derivatives. It is called backprop because errors are passed from back layers towards the front layers thereby saving a lot of repeat computations.
GD requires that all training pairs are considered before taking a single small update step, this is not scalable. Thus in practice we have the so called stochastic gradient descent (SGD) that takes a step just after one example, this is so efficient that it is normally a standard learning algorithm for DNNs together with backprop. There are batch variants of SGD which you can consider as being inbetween SGD and GD, the batch gradient descent approach uses a small random set known as the batch of training examples that it uses to approximate the gradient field via backprop algorithm. Thus SGD can be seen as the batch variant with just 1 example in the batch.
So to learn the maths theory behind ML start from the underlying goals of ML which we have looked at in this discussion. Of course this was just a tip of the iceberg, but the best way to see most ML models is that they are function approximators and we wish to recovery those approximations from input-output training pairs alone, which we call end-to-end learning.
It also helps to visualize ML as just optimization theory. We have a loss function and all we need is an algorithm that helps us find the right settings such that the loss is minimized. In practice SGD+backprop works very well for training modern ML models.
You need to also try and implement some of these algorithms yourself from scratch. Try to implement backprop and SGD for a multi-layer neural network (NN), not a deep now, then try it on MNIST dataset. You can only learn via practice, make sure before implementation you go through backprop and derive it for multi-layer NNs and convolutional neural networks (convNet).
Don't be too much in a hurry though, concepts take time to make sense. In order to help yourself assimilate the stuff a bit easily, solve some problems and try to also explain the systems to others via platforms like Quora, that way you will start to have more and more confidence in your understanding of the maths behind ML algorithms.
Hope this helps.
Footnotes
[1] Empirical risk minimization - Wikipedia
[2] Structural risk minimization - Wikipedia
Some people say that mathematics are useless for a software engineer, machine learning proves them false.
Mathematics are the prerequisites for machine learning because machine learning is math. The computer is only useful to do the calculus.
You'll mainly need to learn calculus, matrix calculation, linear and non linear algebra, statistics and graph calculus.
Let's take a basic ML algorithm, the linear regression.
The goal is to use some data to find a function which takes parameters and gives an output. Data are used to find the function and test it. In the future, we will use the function with some parameters and we will obtain an approximate output.
Let's say our data are about planes, as input we have the number of miles travelled by the plane and its age. As output we have the price of the plane. I don't normalize data to keep things simple.
A sample of our data could be :
miles;age;price
120000;12;120000
48000;4;1500000
...
Our question is : Given the miles travelled by a plane and its age, give a price.
Using linear regression (gradient descent) we will find a vector theta. This vector has two values, theta[0] and theta[1]. To find an approximate price we will multiply the miles by theta[0] and the age by theta[1] to obtain a result, which is an approximate price.
For instance our algorithm could find theta = [2;-10 000] and if we have a plane 5 years old with 78 000 miles, we can than approximate the price doing 78 000 * 2 + 5 * -10 000 so 106 000 dollars.
The hard part is to find the good values for theta. To do that you need some maths.
You have a cost function that give you how good your theta is, this cost function tests your theta values using your data (which already have a price for a plane regarding the miles and the age).
So your goal is to minimize the cost function by adjusting your theta values.
The cost function to minimize is this one :
where
Using the batch gradient descent algorithm each iteration adjusts the theta values using this formula :
then you test the theta value with the previous function J(theta) and you'll see that the cost (ie. the diff between the predicted value and the real) will decrease at each iteration.
As you can see this simple ML algorithm is math. The computer will be useful to compute the previous formulas.
1. Probability and mathematical statistics This is a fundamental requirement for machine learning and so you need to know well. When I say probability it's more than what you studied in High school and almost everything you probably not paid attention to during your undergrad. You need to know about Random variables, their distributions, probabilistic convergence, and estimation theory. That covers a major part of what you need to know here.
Two of my favourite resources are:-
1. Joseph Blitzstein - Harvard Stat 110 lectures
2. Larry Wasserman's book - All of statistics
2. Linear algebra
Linear algebra will pop up every now and then in ML. PCA, SVD, LU decomposition, QR decomposition, symmetric matrices, othogonalization, projections, matrix operations are needed many a times. The good thing is that there are countless resources available on linear algebra.
My all time favourite is Gilbert Strang's MIT lectures on linear algebra.
3. Optimisation
Though only a few things from optimisation are needed most of the time, a strong foundational knowledge will help long way. You need to know Langrange multipliers, gradient descent, and primal-dual formulation. The best resource on this is Boyd and Vandenberghe's course on Convex optimisation from Stanford.
4. Calculus
I wanted to put this on the top, but I'm putting it in the last just to emphasise on the fact that only a fundamental knowledge is needed in terms of calculus. Know about 3-D geometry, integration, and differentiation and you'll survive. It's the easiest to start with amongst the topic I've mentioned here. MIT has good lectures on calculus.
I think with these 4 tools you'll most likely find ML easy to understand. Other than these you may find real analysis and functional analysis relevant too, but they are just formal generalisations of the topics mentioned before.
An introductory Linear Algebra course will generally include the following:
- Vectors
- Vector Spaces
- Matrices
- Inner Product Spaces
- Orthogonality
- Projection
- Linear transformations
- eigenvectors, eigenvalues
- change of bases
- Various decompositions: LU, Polar, SVD.
I also had some geometric algebra, but haven't found that useful so far.
Probability and statistics:
- probabilities
- combinations
- permutations
- distributions
- Understanding of hypothesis testing
- Descriptive statistics: Means, modes, standard deviations, variances etc.
If you can get through:
https://www.khanacademy.org/math...
And
https://www.khanacademy.org/math...
You are good to go.
The machine learning field needs the following mathematics background to understand more things.
- Calculus and in my view the following reference is very good,
- The linear algebra and matrix calculation and the following reference is very relevant,
- The statistics and probability background, and the following books are very good,
- All of Statistics: A Concise Course in Statistical Inference (Springer Texts in Statistics): Larry Wasserman: 9780387402727: Amazon.com: Books
- Amazon.com: A First Course in Probability (9th Edition) (9780321794772): Sheldon Ross: Books
- The knowledge of optimization and the following textbook is very good,
When in doubt, MIT OpenCourseWare is always a good source -- I believe they even offer one or two machine learning courses at the graduate level.
Good general reference/tutorial texts:
- Information Theory, Inference and Learning Algorithms -- McKay
- Introduction to Probability Models -- Ross
- AI: A Modern Approach -- Russel & Norvig
- Algorithms -- Kleinberg & Tardos
Christopher Bishop - Pattern Recognition & Machine Learning. First time I picked this book up it was pretty daunting, but once you get a bit of the maths under your belt, I found that it presents clearer explanations than other texts. I found it really clearly laid out and it seems to progress pretty well. It also covers a lot of stuff.
Linear Algebra:
Gilbert Strang videos on linear algebra are excellent, so are the Khan academy ones. The Gilbert Strang book doesn't seem to get particularly great reviews. On the basis of reviews, I picked up a copy of Howard Anton's Elementary Linear Algebra which seems to be very highly regrarded. I would recommend it. I also have David Poole: A Modern Introduction...which feels a bit more......modern than Anton and I have tended to use it more. Doesn't seem to be a particularly well known book on t'internet, but I find it very clear (more so than Anton).
If you want to practise, then there's Schaums Outline of Theory and Problems of Linear Algebra. (Good for practising but insufficient as a standalone text to the subject)
If you have the luxury of having some time before starting on Machine Learning, I would suggest really focusing on linear algebra in a very hands on way (working through structured examples) and getting a good understanding of orthogonality, vector spaces, eigenvectors, transformations. From my experience, trying to learn the maths at the same time as learning Machine Learning was overwhelming and I would have got a lot more out of ML lectures if I had already got a grasp of the maths.
Mathematics for ML is no different from what you learn in high school or in under-grad studies. If you have that mathematics base, most of the time it is sufficient to understand what's going on in those creepy equations you see in books and research papers. However, sometimes more than that is required and you may have to take some advanced courses in statistics, calculus, linear algebra etc. You may also like to read in general more about How do I learn machine learning?
[1] http://math.mit.edu/linearalgebra/
Teach yourself Machine Learning the hard way ! and follow up Teach yourself Machine Learning the hard way ! (Part 2)
It lists many pre-requisites that you need to understand and also some of the advanced stuff in part2.
I hope this helps.
With regards to mathematics for machine learning, I reckon all of the following skills are important:
(1) Some Basic Mathematical Skills (Linear Algebra, Probability, Optimization)
(2) Knowing how those mathematical skills are exploited for machine learning algorithms
(3) Developing a way to understand mathematics, so that any advanced maths for modern machine learning can be well comprehended.
While one would generally recommend all sorts of linear algebra and probability books for machine learning, I feel those are not always worth the time at least for machine learning. I would recommend following texts to read through (perhaps in order), which should cater to the above three mentioned points.
(a) Pattern Recognition and Machine Learning by Christopher Bishop (Will cater to 1 and 2 above)
(b) Deep Learning book by Goodfellow, Bengio and Courville (Will again strengthen 1 and build on 2)
(c) Understanding Machine Learning by Shai Shalev Shwartz and Shai Ben David (Will advance your skills in 1, strengthen 2, and give an insight to 3)
(d) Ankur Moitra’s rather short but useful book on Algorithmic aspects on Machine Learning (Will mainly cater to 3)
(e) Optimization for Machine Learning by Sra, Nowozin and Wright & Off the convex path by Sanjeev Arora and collaborators (Will cater to 3 and advance 1 and 2)
I truly believe if one can properly understand the above stuff in machine learning, he will develop all the Maths basics needed for machine learning, that too in a very connected form !! Hope this helps !!
I won't say that you “learn” math. I would rather say that you train math.
Imagine you want to train boxing and your coach is teaching you directs, low kicks and high kicks. No matter how many times he shows you how to kick, you can't do it perfectly. You do know that it takes patience, hard work and effort to finally learn how to punch and you need to keep trying and training. After so many trys you can finally say that you can actually punch.
Whats the point?
Math is the same. Consider direct punches your formulas, low kick your theories and high kicks your solutions to problems. No matter how many formulas or theories you know, no matter how many times you've seen solutions you just can't do it perfectly. Why? Because you need to train those formulas, train those theories and knock out those problems with a damn good high kicks. And how do you do that?
- Do as many problems as you can on a daily basis. It is not going to happen overnight, it takes time to train those kicks.
Wanna learn it fast? Better start now!
Some also require a good deal of number theory knowledge especially when discussing SVM, PCA and friends.
Since, you are planning to take a Ph.D. and move the science further you might want to narrow your focus to a particular area for your research while working with your candidate adviser.
- Linear Algebra
- Vector Calculus
- Statistics and information theory
- Discrete Math
- Convex Optimization
- Probabilistic Graphical models
I believe there is a book : http://www.amazon.com/All-Mathem... which can help you get a good head start.
I will try to keep this as concise as possible.
Edit: Somebody merged the original question to this question, so the premise becomes irrelevant.
To become a full stack AI/ML engineer, it is imperative that you have a complete grasp of the mathematical foundations of ML so that you can build upon concepts easily. The basic mathematical skills required are Linear Algebra, Matrix Algebra, Probability and some basic Calculus.
Linear Algebra
The best source to study Linear Algebra is Prof. Gilbert Strang’s Linear Algebra book/course. Video Lectures | Linear Algebra | Mathematics | MIT OpenCourseWare (MIT OCW). There are 34 lectures and believe me, they are completely worth it as after completing this, linear algebra should not pose any more problems for you. Solve some exercises/exams if you want to achieve mastery (recommended).
Matrix Algebra
Matrix algebra is an essential component of deep learning. I personally recommend this (Matrix Cookbook by Kaare Brandt Petersen & Michael Syskind Pedersen): http://www2.imm.dtu.dk/pubdb/vie... (PDF). There are 66 pages of pure matrix operations and this is the absolute “go-to” in case you are stuck trying to understand certain matrix manipulations that a researcher might have done.
Probability & Statistics
Understanding probability is a very important aspect of understanding ML. Some of the key probability concepts that you must be aware of include Bayes’ Theorem, distributions, MLE, regression, inference and so on. The best resource for this is Think Stats (Exploratory Data Analysis in Python) by Allen Downey: http://greenteapress.com/thinkst... (PDF). This absolute gem of a book is 264 pages long and covers all the aspects of probability and statistics that you need to understand with relevant Python code.
Optimization
The go-to book for Convex Optimization is Convex Optimization by Stephen Boyd and Lieven Vandenberghe: https://web.stanford.edu/~boyd/c... (PDF). This is a 730 page book and you need not read it all in one go. Choose the concept which you need to learn depending on your requirements and interest and read that part. It is complete and extremely well written. This book is free as part of the CVX 101 MOOC on EdX.
This 263 page book on metaheuristics, Essentials of Metaheuristics by Sean Luke (http://cs.gmu.edu/~sean/book/met... (PDF)) talks about gradient based optimization, policy optimization etc. and it is well written. One can choose to go through this also if interested.
Data science concepts are covered in the above topics. Other topics can be learnt by googling for sources easily as and when you encounter them. But complete understanding of the above should suffice for 95% of all scenarios.
Achieving mastery of the above topics will surely make you a mathematically strong AI/ML engineer. Now that you have built the foundation, start dipping your feet into research papers. They are absolutely essential as these clearly show the standards of AI researchers/engineers. Firstly, find out the famous papers of AI like RNN, LSTM, SVM etc. and go through the technical content.
Can you understand the jargon?
Can you understand the mathematics?
Can you implement the mathematics in code now without the help of overly sufficient libraries?
These are the key questions to be answered. Once you can answer “Yes/Mostly Yes” to these 3 questions, you are good to go.
After trying to read these papers dealing with the most popular concepts, try to read the not-so-famous papers. arXiv is a great site with hundreds of preprints being published everyday by top researchers and reading the papers from here is like drinking straight out of the fire-hose. Try to choose a paper which looks fairly well written and the abstract seems interesting. Then, read that paper and try to answer those 3 questions again. The same can be done with papers of top AI conferences like NIPS, AAAI, AAMAS, IJCAI, ICML etc. You may not be able to fully implement the papers due to data constraints and other issues, but if you are able to understand even 60% of the mathematical reasoning, then I can safely say you have completed your training.
Do not concentrate on learning more and more “packages”. Concentrate on the concept. While implementing, you will automatically see that you require “this” package and then you will automatically learn to use it. Learning the various commands of random packages won’t help. If you start implementing and writing codes to solve problems or simulate results from a paper, you will automatically learn about packages and use them appropriately; they’ll be the least of your concerns. This is the correct way to maintain “balance” between math and coding. You can also participate in competitions (e.g. Kaggle or conference competitions) to improve speed, development and processing skills if you feel the need to do so.
Alternatively, you can choose to pursue a doctoral degree (like me :P ) in AI/ML to gain a complete in-depth understanding of everything discussed here and more.
(All the links in this answer are working as of 6th July 2017)
- Analysis http://www.amazon.com/Introducti...
- Algebra http://www.amazon.com/Introducti...
- Probability http://www.amazon.com/All-Statis...
They will make your later reading much more pleasant. You will be able to devise your own proofs.
Terence Tao put multiple math-learning advices on his blog:
- Solving mathematical problems
http://terrytao.wordpress.com/ca... - There’s more to mathematics than grades and exams and methods
http://terrytao.wordpress.com/ca... - There’s more to mathematics than rigour and proofs
http://terrytao.wordpress.com/ca...
I started writing the github awesome page for that ,it may help ,its having topics from basic machine learning maths to advanced and quantum machine learning
krishnakumarsekar/awesome-machine-learning-deep-learning-mathematics
Thanks and Regards
Krishna
krishnakumarsekar/awesome-quantum-machine-learning
A2A.
To have a basic mathematical background, you need to have some knowledge of the following mathematical concepts:
- Probability and statistics
- Linear algebra
- Optimization
- Multivariable calculus
- Functional analysis (not essential)
- First-order logic (not essential)
You
can find some reasonable material on most of these by searching for
"<topic> lecture notes" on Google. Usually, you'll find good
lecture notes compiled by some professor teaching that course. The first
few results should give you a good set to choose from.
For instance, here’s a list of some lecture notes that I just found:
Probability & Statistics : http://www2.aueb.gr/users/demos/...
Linear algebra : https://www.math.ku.edu/~lerner/...
Optimization : http://www.ifp.illinois.edu/~ang...
Calculus: https://www.math.wisc.edu/~angen...
Matrix Calculus : http://www.atmos.washington.edu/...
You should skim through these, without going into too much detail. You can come back to studying the topics as and when required while learning ML.
If you want to be a real Data Scientist Not the fake ones with skills of Analyst and not any mathematical intuition or point of view. Real Data Scientist Need to have very strong mathematical grounding.
So to learn Mathematics for ML this should be the order :-
- Start with probability ( Conditional Basic Marginal etc …)
- Mathematical Series and Convergence , Numerical methods for Analysis
- Matrix and Linear Algebra
- Bayesian Statistics
- Vectors ( Most Important)
- Calculus
- Markov Process and Chains
- Basics of Optimization ( Linear/ Quadratic)
- Advanced Matrix Algebras and Calculus ( Gradient , Divergence , Curls etc)
This much mathematics will enable the understanding behind the core ideas of ML and probabilistic algorithms,
You should pause now and start analysing certain Packages from Scratch in Python :
1. K-NN is great starting point learn it , and code it from scratch.
2. Logistic Regression with Gradient Descent.
Till now you can see the parameters and numbers moving in a matrix form , and understand the mathematics of prediction, And if you feel this is enough. Hold your breath. There is more exciting stuff to come. This will enable you to be a beginner of being a “Real Data Scientist”.
Next Start with :-
- Stochastic Models and Time Series Analysis
- Differential Equations
- Dynamic Programming and Optimization Techniques
- Fourier's and Wavelengths
- Random Fields
- Basic Knowledge of PDEs
- Techniques to solve PDEs using Monte-Carlo , Polynomial Expansions.
These mathematical techniques will help you visualize the model’s working and how to model and process raw data to create unique models whose functionality can be tuned. Parameters can be optimized for the problems and fine tuned with these techniques.
For a Next Level Up:- ( Statistics of Higher Dimensions)
- PDEs numerical solution with numerical input/ random input. ( fascinating subject to work on )
- Stochastic Differential Equations and Solutions
- PCA etc
- Dirichlet Processes, Markov Decision Process.
- Uncertainty Quantification - Polynomial Chaos, Projections on vector space
I think these are subject which one must learn to be a good Machine learning engineer in 21st century. with a knowledge base like this one can connect dots very rapidly and build systems and model of high accuracy.
( I am not a big fan of Neural nets,..so forgot to mention here)
Of course, it helps to have a notion of what it means to define or model the concept described by the verb ``to learn''. That, my friends, is the realm of philosophy and pedagogy, but to apply it requires an understanding of the notion of a model, and we are back to my main point: Take some model theory. It's not likely to hurt you for more than a semester, and well...
NO PAIN NO GAIN!!
In linear algebra, a solid understanding of eigenvalues and eigenvectors is important for topics such as principal component analysis, factor analysis and other dimensionality reduction tasks.
For the first, I suggest Gilbert Strang's "Linear Algebra and Its Applications", while for the second, "Probability, Random Variables And Random Signal Principles" by Peebles is a good choice.
EDIT: A previous answer suggests Convex Optimization text, which I also recommend. A good text is "Convex Optimization" by Stephen Boyd, which is also available for free in the author's website.
- ML class is running right now - https://www.coursera.org/course/ml
- Linear algebra: http://www.khanacademy.org/math/...
- Place where to find more useful courses: http://www.topfreeclasses.com
For understanding Machine Learning you need following Mathematics prerequisites :
1. Probability and Statistics : Machine Learning has deep roots in Statistics. In fact the modern Machine learning is essentially Statistical learning i.e using stats to find patterns in data and inferring using them. So Stats and Probability are bare minimum for ML.
2. Linear Algebra : This is required because data is represented as matrix in Machine Learning and essentially all ML algorithms can be seen as Matrix manipulation in the end so basic understanding of Linear Algebra is required.
3. Optimization : Many people argue that Machine learning is a fancy name for optimization. While this is true to certain extent there is more to ML than optimization. But a large part of it is indeed optimization. In the end mostly all ML algos come down to some optimization task.
4. Calculus : This is a very useful tool for ML. Most ML algos rely on Differential Calculus to find solutions (Gradient Descent, Newton's method, quasi Newton's method etc.).
IMO if you master these topics than you can learn pretty much anything in ML, because all algorithms are essentially application of these tools in Ml.
Some bit of stochastic processes (like markovian processes, etc)
Linear Algebra: Data analysis and machine learning builds up ALOT on these concepts.
Algorithms: not as important, but still important when it comes to optimizing your solution. some graph theory
Basic Linear Programming and recognizing convexity and relaxation.
I guess once you have a fair idea of most of these concepts, then its pretty simple to pick up the intuition behind any algorithm and where it would work/ how to improve it/ etc.
Goal 1: To understand what is ML, how to apply different algorithms to the task, how interpret the output,common pitfalls etc : You will need to have a grasp on linear & matrix algebra,probability and optimization. You don't need to take a deep dive into each one, but study basic things like eigen vectors,conditional probability,distributions, bayes theorem.Additionally learn the concepts of overfitting, cross validation.
Resources
Video Lectures : Machine learning on coursera ( not only the one by AndrewNg, there are few others as well)
Books : Machine Learning by Tom Mitchell. It includes the necessary linear algebra and probability too.
Goal2:Why the current algorithms are designed in a particular way, How do they fundamentally differ from each other: If you are more interested in the theoritical aspects like how kernels of a support vector machine are defined, or how deep learning neural networks are designed or how to tweak the existing algorithms to make a new one,then you might want to extend your mathematics to functional analysis, topology , advanced optimization.
Resources : Video: Advanced Machine learning, Caltech (Prof. Mostafa's Lecture)
Books : Mining of massive datasets by ullman, Any good books on advanced linear algebra, topology but they wont connect it to machine learning.
Learning mathematics is about doing. Remember the 80/20 Rule : You must study theory 20% of the time and practice/implement what you learn 80% of the time.
Here is a list of books you could use. You can find accompanying online courses for many of them.
1. Strang's Linear Algebra and its Applications
2. Apostol Calculus - Both the volumes
3. Golub's Matrix Computations
4. Sheldon Ross' Probability
5. Elements of Statistical Learning by Hastie et al
6. Bishop's Pattern Recognition and Machine Learning
7. David Barber's Bayesian Reasoning and Machine Learning
8. Kevin Murphy's Machine learning: a Probabilistic Perspective
9. Wasserman's All of Statistics and Non-parametric Statistics
From Hacker News :
1.) Casella, G. and Berger, R.L. (2001). "Statistical Inference" Duxbury Press.
2.) Ferguson, T. (1996). "A Course in Large Sample Theory" Chapman & Hall/CRC.
3.) Lehmann, E. (2004). "Elements of Large-Sample Theory" Springer.
4.) Gelman, A. et al. (2003). "Bayesian Data Analysis" Chapman & Hall/CRC.
5.) Robert, C. and Casella, G. (2005). "Monte Carlo Statistical Methods" Springer.
6.) Grimmett, G. and Stirzaker, D. (2001). "Probability and Random Processes" Oxford.
7.) Pollard, D. (2001). "A User's Guide to Measure Theoretic Probability" Cambridge.
8.) Bertsimas, D. and Tsitsiklis, J. (1997). "Introduction to Linear Optimization" Athena.
9.) Boyd, S. and Vandenberghe, L. (2004). "Convex Optimization" Cambridge.
10.) Golub, G., and Van Loan, C. (1996). "Matrix Computations" Johns Hopkins.
11.) Cover, T. and Thomas, J. "Elements of Information Theory" Wiley.
12.) Kreyszig, E. (1989). "Introductory Functional Analysis with Applications" Wiley.
Please do try implementing as many things as you can. Pick up a project. Talk to your peers and professors and people, see if you can help them with what you've learned. Do.
Some algorithms are really sweet they are available in Wikipedia with formulae, implementation and application.
Some dodge you till you watch two or three YouTube videos (Victor laverenko, Bert huang, udacity or MIT lectures)
Some are really mischievous, you got to do a lot of research, they test your patience and perseverance more than your mathematics!
And there are lots of books you can read.
How to learn a particular algorithm?
- First from the business point, learn why to use a algorithm and not any other counter part of its. Like why Fuzzy K-Means instead of K-Means.
- Secondly from an analyst point of view. Learn how to use the algorithm to solve some use cases. What is it meant to do.
- Last will be the mathematics. The how of the algorithm. And more research to enhance the algorithm and patent it.
P. S. It is normal to not understand in the first go.
P. P. S. And very normal to get totally confused in the second and third.
Hi,
I work for a Data Science and AI company called InData Labs and on of
our tech experts has recently prepared a short guide to learn neural
networks, hope it is helpful for you:
A short guide to neural networks. Master them and become famous.
Mathematics is important part for learn machine learning. Necessary topics and useful resources of mathematics for machine learning?
Here i am sharing weightage of machine learning important mathematics topics and making your confusion very clear. So see below list and start preparation according it.
35% - Linear Algebra
25% - Probability Theory and Statistics
15% - Multivariate Calculus
15% - Algorithms and Complex Optimizations
10% - Others
Now i am taking forward my article in deep level so you get totally clearance to start machine learning or artificial intelligent .
- Linear Algebra: Topics such as Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Eigendecomposition of a matrix, LU Decomposition, QR Decomposition/Factorization, Symmetric Matrices, Orthogonalization & Orthonormalization, Matrix Operations, Projections, Eigenvalues & Eigenvectors, Vector Spaces and Norms are needed for understanding the optimization methods used for machine learning. The amazing thing about Linear Algebra is that there are so many online resources.
- Probability Theory and Statistics:Probability Rules & Axioms, Bayes' Theorem, Random Variables, Variance and Expectation, Conditional and Joint Distributions, Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian), Moment Generating Functions, Maximum Likelihood Estimation (MLE), Prior and Posterior, Maximum a Posteriori Estimation (MAP) and Sampling Methods.
- Multivariate Calculus: topics include Differential and Integral Calculus, Partial Derivatives, Vector-Values Functions, Directional Gradient, Hessian, Jacobian, Laplacian and Lagragian Distribution.
- Algorithms and Complex Optimizations: Knowledge of data structures (Binary Trees, Hashing, Heap, Stack etc), Dynamic Programming, Randomized & Sublinear Algorithm, Graphs, Gradient/Stochastic Descents and Primal-Dual methods are needed.
- Others: This comprises of other Math topics not covered in the four major areas described above. They include Real and Complex Analysis (Sets and Sequences, Topology, Metric Spaces, Single-Valued and Continuous Functions, Limits, Cauchy Kernel, Fourier Transforms), Information Theory (Entropy, Information Gain), Function Spaces and Manifolds.
Now you are thinking and looking for best knowledge and practice resources for your week points right? Don’t worry learners i would also like to suggest some few good resources for it.
- For Books : Programming Collective Intelligence by Toby Segaran , Pattern Recognition and Machine Learning and others Artificial Intelligence 3e: A Modern ApproachPaperback by Russell and other books.
- Best resources for online , video tutorials : Coursera , Kachhua.com , Udemy Online Courses - Learn Anything, On Your Schedule, chalkstreet, etc.
Thank you. Keep Learning.
I made a podcast episode on the math you need for machine learning, and the resources for learning (if you like audio): Machine Learning Guide #8
It covers most of the math you need to get started with machine learning.
There are many reasons why the mathematics of Machine Learning is important and I will highlight some of them below:
- Selecting the right algorithm which includes giving considerations to accuracy, training time, model complexity, number of parameters and number of features.
- Choosing parameter settings and validation strategies.
- Identifying underfitting and overfitting by understanding the Bias-Variance tradeoff.
- Estimating the right confidence interval and uncertainty.
- Linear algebra is a cornerstone because everything in machine learning is a vector or a matrix. Dot products, distance, matrix factorization, eigenvalues etc. come up all the time. Gilbert Strang’s linear algebra course i would recommend
- a youtube playlist
- the book: Introduction to linear algebra
- course page at MIT OCW
- Multivariate Calculus: Some of the necessary topics include Differential and Integral Calculus, Partial Derivatives, Vector-Values Functions, Directional Gradient, Distribution.Differentiation matters because of gradient descent. Again, gradient descent is almost everywhere . some courses i recommend
- Introduction to Mathematical Thinking - Stanford University | Coursera
- Convex Optimization
- Massively Multivariable Open Online Calculus Course from the Ohio State University - the course is a first taste of multivariable calculus, but viewed through the lens of linear algebra.
- Probability Theory and Statistics: Machine Learning and Statistics aren't very different fields. Actually, someone recently defined Machine Learning as 'doing statistics on a Mac'. Some of the fundamental Statistical and Probability Theory needed for ML are Combinatorics, Probability Rules & Axioms, Bayes' Theorem, Random Variables, Variance and Expectation, Conditional and Joint Distributions, Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian), Moment Generating Functions, Maximum Likelihood Estimation (MLE), Prior and Posterior, Maximum a Posteriori Estimation (MAP) and Sampling Methods.
- Khan Academy's Linear Algebra, Probability & Statistics, Multivariable Calculus and Optimization.
- Larry Wasserman's book - All of statistics: A Concise Course in Statistical Inference.
- Udacity's Introduction to Statistics.
- Algorithms and Complex Optimizations: This is important for understanding the computational efficiency and scalability of our Machine Learning Algorithm and for exploiting sparsity in our datasets. Knowledge of data structures (Binary Trees, Hashing, Heap, Stack etc), Dynamic Programming, Randomized & Sublinear Algorithm, Graphs, Gradient/Stochastic Descents and Primal-Dual methods are needed.
- Boyd and Vandenberghe's course on Convex optimization from Stanford.
Given all that , ML is not all about Maths and to frank Starting you will hardly spend 5% of your effort doing maths
I wrote a detailed medium post on this. You can read it here Math for Deep Learning is not Merlin’s Enchantment – Vaibhav Aparimit – Medium
First of, I really like your question. You seem to implicitly understand that math is an essential skill required to grasp the underpinnings of machine learning .
If your question was around deep learning, I would say linear algebra for 95% cases. In case of machine learning you would need to know probability ( especially bayes rules and conditional probability) , differential calculus and linear algebra( matrix multiplication, Eigen vectors , determinants , Hessians )
Hope this helps .
- Linear algebra
- Calculus
- Matrix calculus
- Probability and statistics
- Optimization - linear programming, convex optimization, non-linear optimization
Some other topics that are useful in specific sub-areas of machine learning are:
- Basic graph theory
- Basic algorithms
- First-order logic
- Linear Algebra
- Probablity theory and statistics
- Multivariate calculus
- Algorithms and Complex optimizations
- Others- Real and Complex Analysis (Sets and Sequences, Topology, Metric Spaces, Single-Valued and Continuous Functions, Limits, Cauchy Kernel, Fourier Transforms), Information Theory (Entropy, Information Gain), Function Spaces and Manifolds.
To learn them go through
- Coursera | Online Courses From Top Universities. Join for Free
- Udacity - Free Online Classes & Nanodegrees
- Udemy Online Courses - Learn Anything, On Your Schedule
- Tutorialpoint.com
EE364a: Convex Optimization I
If you are looking to refresh/clarify linear algebra concepts after going through the above course, Khan Academy could be useful. It also has videos on other topics that might be of interest for machine learning. If you are looking for concepts like PCA etc., you might not find it here..
Another useful resource that can focus on concepts is video lectures.. Machine Learning - videolectures.net.. and search only for tutorials.
There is no one stop shop as the concepts can go deeper and might require special treatment.. All this is theoretical which can clarify concepts. However, if you are a novice and if you want a deep and intuitive feel for concepts then pick one simple problem and implement the solution.
Understanding Machine Learning with R - uFaber.com