How do I learn mathematics for machine learning?

https://www.quora.com/How-do-I-learn-mathematics-for-machine-learning
 
How do I learn mathematics for machine learning?
Software for productivity tracking.
Time tracking and productivity improvement software with screenshots and website and applications.
55 Answers
William Chen
The math most directly useful for machine learning is:

If you're interested in an accessible introduction to matrix algebra, Coursera is running a course on it right now: Coding the Matrix: Linear Algebra through Computer Science Applications

The applied math most directly useful for machine learning is:
Become a successful algo & quant trader in 6 months.
Acquire the knowledge, tools & techniques used by traders in the real world.
Abhinav Sharma
When going through my Machine Learning course last semester, I felt like I had the most catching up to do with Linear Algebra. I felt key ideas from LinAlg are harder to remember over time than Probability. I found myself to be mostly working with probability distributions, Bayes' rule, MLEs and MAPs, while the algebra side of it was mostly optimization in higher dimensions, was mostly Matrix calculus.

I discovered that the Matrix Cookbook was popular with most students for working with Matrix Calculus as it seems to have a never-ending list of  matrix derivatives:

http://www2.imm.dtu.dk/pubdb/vie...

As far as brushing up on the rest of your Linear Algebra knowledge is concerned, I highly recommend Strang's lectures/book:

http://ocw.mit.edu/courses/mathe...

Highly relevant topics include knowing about rank and inversion, SVD, and also make sure you're very comfortable with eigenvalues and eigenvectors, amongst other things.

Finally, with Analysis, I don't think ML requires a formal introduction to Analysis at all. Its important to know higher dimensional calculus well, especially parts related to optimization, such as Lagrange multipliers, the primal-dual form, and in general, the calculus of Matrices, and you should be good to go.

Overall, I think the case with Linear Algebra and Calculus is to work your way through an ML book/course, and stop and look at the relevant math when necessary, whereas you need a strong foundation in Probability right from the beginning, and most textbooks on ML tend to talk a lot about probability while skimming over the mathematical details of LinAlg and Calculus.
Calvin John

Let me first caveat what I’m about to say with this: go to graduate school.

To show you just how super-serious I am about this, I’m even going to separate this caveat from the rest of the answer with one of the ultra-cool line breaks.


Alright, at this point, I’m assuming that you are still solely considering graduate school preparation without an undergraduate education. Let’s go.

My background consists of an undergraduate BS in mathematics, a minor in physics, and a few years of research experience that has spanned from charged particle detectors (physics/EE) to autonomous vehicle system design for collision detection and evasion. Long story short: I’m far more qualified to answer your question when robotics is emphasized, so that’s what I’m going to do.

Robotics is Multi-Disciplinary

Robotics is a highly multi-disciplinary field. In fact, I’d argue that it could well be the academic field which encompasses the largest quantity of distinct domains into its core structure. When we’re talking about robotics, we’re really talking about

  • Computer science
  • Mathematics
  • Computer engineering
  • Electrical engineering
  • Control engineering
  • Systems engineering
  • Mechanical engineering
  • Physics (mechanics, more specifically)

What’s even more impressive about the above list than its size is the depth of each field. Aside from control and systems engineering, which are a bit more specialized and less fundamental than the others, each of the above domains are extremely broad—indicating that if you were to break down robotics concepts into a networked graph, it would resemble something like this:

[1]

Needless to say, roboticist ultimately specialize in a much narrower range so that expertise in a topic can be attained. But that doesn’t change the fact that to pursue robotics, high breadth and versatility in engineering and math is a tool whose utility can’t be overstated.


Specific Areas of Research

Now, regardless of whether you want to pursue a masters or a Ph.D., you will ultimately have to carve out a niche for yourself. As I mentioned above, mastery of all robotics is a hopelessly daunting task; it’s impossible. Therefore, it’s important that you expose yourself to the different areas of robotics, and gradually hone in on your desired path according to the topics with which you’re interested and at which you’re talented.

Here’s my breakdown of robotics research, in increasing order of mathematical abstraction and decreasing order of hands-on engineering and building:

  1. Sensors. About as applied and hands-on as you can get, the domain of sensors works on expanding the current technical constraints that robotics hardware faces. It’s because of these guys that the iphone magically gets smaller and smaller every year, while also increasing its technological capacities. An example of the importance of this domain which is even more specific to robotics is radar evading drones. Remember when Osama Bin Laden got taken out because we flew a helicopter in Afghanistan that magically evades radars? Thanks sensors.
  2. Nano-robotics. Focusing on developing robotic systems on the micro-level, nano-robotics explores how robotic agents can be built and implemented on a scale sufficiently small that they can be directly inserted into your body. Sound scary? It shouldn’t. Nano-robotics has a plethora of game-changing medical applications, some of which include legitimately curing cancer and preventing aging.
  3. Machine vision. While the ability to process and interpret visual information comes very intuitively to humans, translating our abilities to an algorithmic environment in this matter has proven to be an intimidating process. In fact, I’d argue that the largest obstacle facing self-driving cars is machine vision. Just take a look at the self-driving car expert at Tesla who died because his car failed to distinguish between the bright sky and an incoming white truck. [2]
  4. Robotic learning. When machine learning is applied in a robotic context, it basically becomes robotic learning. Robotic learning is the overlap between robotics and machine learning; it approaches the problem of developing tools for adaptation and learning in robotic systems. Very cool field, with a lot of promising application, and very well suited for someone interested in machine learning and robotics.
  5. Robotic control. This is the area in which I’m currently nested. Control represents a mathematical approach to modeling the behavior and evolution of a Dynamical system - Wikipedia in relation to inputs, which can be used to affect the system’s output. The goal here is to mathematically demonstrate that a certain approach for input selection guarantees that the system’s output will quickly converge to a stabilized desired range, as illustrated in this kick-a** picture. [3]

Because you have stated that robotics and machine learning are your interests, I’m going to assume your interests align with the #3–5 end of the spectrum. But even when your interests are honed in on these two areas, there is still a massive range of topics and skill sets spanned by these two very broad domains.


Developing Skills for Robotic Learning

Again, I’m far from an expert in robotic learning and machine learning, but I’ll do my best to show some helpful tips for pursuing this domain. The fundamental fields from which machine learning constantly draws, as I understand it, are the following:

  • Probability
  • Statistics
  • Algorithms
  • Optimization
  • Systems

The last one is a bit more of a stretch in comparison to the others, but I’ve heard that a high portion of machine learning can actually be approached from a systems perspective, and that its inception actually arose from system theory modeling.

For probability and statistics, both intuition and rigorous technicality will be important. I had a horrible textbook which provided very little conceptual basis for the theorems, and mostly included a bunch of isolated problems which were crudely connected in a very disjointed way. I recommend Introduction to Probability by Grinstead and Snell, [4] which provides a lot of clear, well-articulated conceptual explanations which enhance both intuition and precise reasoning on the subject. It’s also free and available online, which ya’ know, is always a big plus.

Becoming comfortable with algorithms is a task which can more easily be achieved in a college setting, but one which is also very feasible to execute independently. Regarding a textbook to guide you through key concepts to algorithm theory, I recommend to look no further than the classic Introduction to Algorithms by Cormen, Leiserson, Rivest and Stein. [5]

Additionally, I would look to two additional sources to continually expand algorithmic skills: Project Euler Archived Problems - Project Eulerand Topcoder Deliver Faster through Crowdsourcing. Project Euler encompasses a diverse range of mathematical problems for algorithmic development which will strengthen your mathematical algorithmic thinking, and your “out of the box,” creativity. Topcoder provides challenges which will improve your technical programming skills, and diversify and expand your problem solving breadth.

Of course, once you have a solid background in the above topics, you’ll want to receive a comprehensive introduction to robot learning, for which I’ve been told that Robot Learning by Connell and Mahadevan is a solid choice. [6]

Although robotic learning and robotic control are distinct domains, robotic learning is intrinsically tied in to concepts from control theory. In fact, one of the most challenging problems facing the robotic learning community is that it lacks the rigorous analysis and descriptions that the control and systems theories possess.

For example, a self-driving car that implements a series of clever robotic learning algorithms will never be implemented without tools from control systems. Why? Because without tools from control and systems theory, you will never get close to demonstrating rigorous, mathematically demanding qualities such as robustness, safety guarantees, stability, etc., without which, the government wouldn’t let your self-driving car see the light of day.

Robotic Control

I think that optimization, control, and systems are all presented and integrated very concisely in Design of Optimal Control Systems by Bini. [7] This book consists of more than a minimal amount of knowledge in any of these topics which is needed for machine learning. But a deep understanding of at least some of the ideas shown in this book will allow for insights to be drawn between these domains which most others will likely not be capable of seeing.

Note that I recommend the above for someone interested in both machine learning and robotic control. If you’re primarily interested in robotic control, then your mathematical skills need to be more sophisticated than the vast majority of other engineers. This is likely the only engineering discipline in which highly abstract mathematical fields play a fundamental role. They include

  1. Real analysis
  2. Systems of Differential Equations
  3. Dynamical Systems (similar to 2., but distinct from it)
  4. Advanced Linear Algebra
  5. Advanced Optimization
  6. Basic Topology
  7. Set Theory (more than the basics, but not quite “advanced” set theory)

Clearly, your mathematical skills have to be beyond the more applied end of the spectrum in which things like formalities, proofs, theorems, and rigor are almost never relevant.

For a comprehensive introduction to real analysis and topology that isn’t esoteric (difficult to find), I recommend Basic Analysis by Lebl. [8] While the book isn’t intended for studying topology specifically, it covers nearly all of the fundamentals which are relevant to control. Note that real analysis is the most important item in the above list.

Advanced Linear algebra is the most difficult field for which to find an accessible, engaging textbook, I.M.O.. The majority of the texts are far too focused on minute, irrelevant details and burdensome proofs whose understanding gains little insight regarding the deeper concepts. More importantly, most textbooks totally fail to connect the ideas to deeper concepts which are both cool and incredibly useful. After a lot of searching, I found hope in an unexpected place: online lecture notes. [9] If you master this book, and its difficult problems, to the point where you can comfortably walk through the main concepts with a high school student, then you’ll be five steps ahead of me.

As for dynamical systems, I’d say that Dynamical Systems by Sternberg does the trick. [10] Until you get to the more theoretical content like stability and invariance, you really want to focus more on the concepts; the details aren’t particularly important, surprisingly. You really just need to know what kind of assumptions you have to make about the system you’re modeling.

Once you’re comfortable with most of the above, you can get your hands dirty with some actual control theory. For this, I recommend Mathematical Control Theory by Sontag. [11]


:

I have a hunch that’s not what you want to here, since you didn’t ask for advice regarding this matter. So I’m sorry if this caveat irks you in any way, but it’s the best advice I can give, and I think it’s important for you to hear.

I’m a firm believer in pragmatic optimism, and while it’s optimistic to believe that admittance into graduate school—especially in a technical field—is feasible without an undergraduate degree, it is far from optimistic. Without an undergraduate degree, you are immediately excluded from consideration for all departments at the majority of universities.

I can’t find any specific statistics on this matter, so you’ll have to choose whether or not to take my word for it. But trust me when I say that I can currently think of one graduate school that doesn’t necessitate an undergraduate degree as a strict requirement.

Even putting the strict requirements aside, for deeply embedded multidisciplinary fields like robotics and machine learning, an undergraduate education is crucial. Although I do think that the ability to interact with professors; learn with faculty and peers in person; and receive a curriculum designed by experts on which you are tested in a competitive environment are all vital assets for initiating the engineering experience in any field, they are especially true for robotics.

Another important distinction regarding your question is are you planning for a masters or a Ph.D.?

[1] Pawel Pralat: Graph Theory

[2] Tesla driver killed while using autopilot was watching Harry Potter, witness says

[3] Vehicle stability control systems: An overview of the integrated ...

[4]https://www.dartmouth.edu/~chanc...

[5] Introduction to Algorithms, 3rd Edition (MIT Press): Thomas H. Cormen, Charles E. Leiserson, Ronald L. Rivest, Clifford Stein: 9780262033848: Amazon.com: Books

[6] Robot Learning | J. H. Connell | Springer

[7] http://retis.sssup.it/~bini/math...

[8] http://www.jirka.org/ra/realanal...

[9] https://www.math.uh.edu/~climenh...

[10] Dynamical Systems (Dover Books on Mathematics): Shlomo Sternberg: 9780486477053: Amazon.com: Books

[11]http://www.mit.edu/~esontag/FTPD...

Nikita Zhiltsov
A couple of years ago, based on his experience, Bradford Cross gave a comprehensive list of the best resources on machine learning and the prerequisites in his blog ("Measuring measures"). Unfortunately, it appears to be down right now.

UPD:
Here the blog post at WebArchive's mirror is: http://web.archive.org/web/20101...

Bradford's lists at Amazon:
  • Analysis [1]
  • Linear Algebra [2]
  • Probability [3]
  • Statistics [4, 5]
  • Optimization [6]
  • Machine learning [7]
  • Feature Selection [8]

I hope, Mr. Cross will be able to join the discussion.

[1] http://www.amazon.com/Analysis/l...

[2] http://www.amazon.com/Matrix-Fu/...

[3] http://www.amazon.com/Probabilit...

[4] http://www.amazon.com/Statistics...

[5] http://www.amazon.com/Nonparamet...

[6] http://www.amazon.com/Heuristic-...

[7] http://www.amazon.com/Machine-Le...

[8] http://www.amazon.com/Feature-Se...

UPD 2:
Here is the list of must-read books for theoretical machine learning [1], which is attributed to prof. Michael Jordan (UC Berkeley). The sources are [2] and [3].

[1] https://www.goodreads.com/review...

[2] Learning About Statistical Learning

[3] AMA: Michael I Jordan • /r/MachineLearning
Osman Baskaya
There is a book named "Mathematics for Computer Science". It is also a lecture in MIT. This is the MIT OCW link of that course: http://ocw.mit.edu/courses/elect....

This course materials are old by the way. Good news is that you can find the book (composition of all materials) easily by searching. If I am not wrong, the last revised version of this book is 6th May, 2012.

You need linear algebra as well. I recommend you for this reason, Gilbert Strang's "Linear Algebra and Its Applications". It may be little bit tough, but it is a great book.

If you want to dive into probabilistic approach, you can enroll Probabilistic Graphical Models course: https://www.coursera.org/course/pgm. I heard that it is very good course. Textbook of that course looks very useful: http://www.amazon.com/Probabilis...
Chomba Bupe

The current machine learning (ML) algorithms are based upon mapping functions.

F:XY

 

The function F

can be anything such as a support vector machine (SVM), a restricted Boltzmann machine (RBM), a deep neural network (DNN) or anything else that you can hand engineer yourself. In application areas, X represents the input space while Y

represents the output space.

In speech recognition X

might be a set of spectrograms while Y a set of identities representing the speakers. In image recognition, X is the raw image pixel space while Y is the categorization consisting of different classes in which xiX

can fall into.

Each ML model has parameters w

that affects the behavior of F

that we can normally adjust in order to change the behavior of that function. We can thus write the mapping more conveniently as:

yi^=f(xi

,w)

 

where yi^Y

 

We will focus on supervised ML model where we have a dataset T

of training input-output pairs in the form:

T=[(x1,y1),(x2,y2),,(xN,yN)]

 

The goal of supervised machine learning is to find the best parameter values w^

that makes the function F

map the input-output pairs with the least error. So in supervised ML we have two main issues:

  1. Define a fitness measure that tells us how well the ML model is performing on the trainging set T
  • .
  • Generalization: We can run the same fitness measure on the test set after training is complete in order to measure how well the model generalizes to novel inputs. This is a very important concept in modern ML.
  • A learning algorithm to update the weights, ww^
  1. .

This is where the maths come in, to understand the underlying maths concepts you need to understand what ML is trying to solve in the first place. The aim here is to find solutions to those 3 issues mentioned above and maths can help us with that.

1: A fitness measure:

This is normally done by an objective function also known as the loss/cost function:

L(yi^,yi)

 

where yi^

= actual output and yi

= desired output.

In empirical risk minimization[1](ERM) the goal is to to minimize the overall loss as defined by the risk R

:

Remp(w)=1NNi=1L(f(xi,w),yi)

 

ERM states that the learning algorithm should choose the hypothesis function f^

such that the empirical risk is minimized, In simple mathematical terms we need to solve:

w^=argminRemp(w)

 

Where f^=f(x,w^)

 

2: Generalization:

The above naive ERM can result in the function f^

just memorizing the training examples which can cause what is called overfitting, that is, fitting the function F to each and every noisy/outlier data point. That is not ideal thus instead we normally use structural risk minimization[2](SRM) whereby we add a regularization term C(w)

to the risk, thus we get the regularized risk:

Rstru(w)=1NNi=1L(f(xi,w),yi)+λC(w)

 

Rstru(w)=Remp(w)+λC(w)

 

Then in SRM we need to solve:

w^=argminRstru(w)

 

Regularization simply simplifies the weight parameters so that they don't model too much of outliers or noise. That is done by penalization of large weight values in w

which are a cause of most overfitting issues. Thus L0 norm can be used in order to favor a very sparse set of weights whereby most weight values are zero. You can also use L1 or L2 regularization instead as the L0

norm is hard to optimize. Other weird regularization methods have since popped up such as dropout, which is used in learning algorithms for DNNs whereby neurons are randomly dropped out and back during training so that the overall network becomes robust to noise, dropout can be loosely seen as an ensemble method.

3: A learning algorithm:

Learning in current ML can be viewed as a way to update the weights in order to find the optimal parameters. ERM and SRM both are relying on the existence of a learning algorithm for weight adjustment. We need an algorithm to find the weights that solve.

w^=argminRemp(w)

or

w^=argminRstru(w)

 

We need a way to update the model such that

w^w

 

In current ML systems we just look to the old idea of gradient decent (GD) from numerical optimization. In GD we simply just move down the steepest slope on the error (risk) surface defined by the risk R

. That means we can just use the update rule defined by.

wt+1=wtαRwt

 

where t

=step count, α

=learning rate

Here we assume a convex surface defined by R

but in practice especially for DNNs the surface is highly non-convex but in practice almost any local minima is just good enough, plus we can add momentum to the update rule so that it can escape from the local minima traps easily. Also the shear number of parameters makes it harder for the DNN model to get trapped in a local minima trap as there are many possible escape routes through the other many dimensions.

In DNNs the gradient computations can become cumbersome even for a modern machine as the number of gradient steps needed to hit w^

are normally large. Thus we need fast ways to accelerate gradient computations for layered architectures. Backpropagation (backprop) algorithm, to be specific, is a way of computing gradients extremely efficiently in any differentiable computational graph. Backprop uses chain rule by starting from the output layer which is directly connected to the loss function and hence easier to evaluate the derivatives and then move towards the layers (input layer) far away from the output layer while chaining the derivatives. It is called backprop because errors are passed from back layers towards the front layers thereby saving a lot of repeat computations.

GD requires that all training pairs are considered before taking a single small update step, this is not scalable. Thus in practice we have the so called stochastic gradient descent (SGD) that takes a step just after one example, this is so efficient that it is normally a standard learning algorithm for DNNs together with backprop. There are batch variants of SGD which you can consider as being inbetween SGD and GD, the batch gradient descent approach uses a small random set known as the batch of training examples that it uses to approximate the gradient field via backprop algorithm. Thus SGD can be seen as the batch variant with just 1 example in the batch.


So to learn the maths theory behind ML start from the underlying goals of ML which we have looked at in this discussion. Of course this was just a tip of the iceberg, but the best way to see most ML models is that they are function approximators and we wish to recovery those approximations from input-output training pairs alone, which we call end-to-end learning.

It also helps to visualize ML as just optimization theory. We have a loss function and all we need is an algorithm that helps us find the right settings such that the loss is minimized. In practice SGD+backprop works very well for training modern ML models.

You need to also try and implement some of these algorithms yourself from scratch. Try to implement backprop and SGD for a multi-layer neural network (NN), not a deep now, then try it on MNIST dataset. You can only learn via practice, make sure before implementation you go through backprop and derive it for multi-layer NNs and convolutional neural networks (convNet).

Don't be too much in a hurry though, concepts take time to make sense. In order to help yourself assimilate the stuff a bit easily, solve some problems and try to also explain the systems to others via platforms like Quora, that way you will start to have more and more confidence in your understanding of the maths behind ML algorithms.

Hope this helps.

Footnotes

[1] Empirical risk minimization - Wikipedia

[2] Structural risk minimization - Wikipedia

Florian Courtial

Some people say that mathematics are useless for a software engineer, machine learning proves them false.

Mathematics are the prerequisites for machine learning because machine learning is math. The computer is only useful to do the calculus.

You'll mainly need to learn calculus, matrix calculation, linear and non linear algebra, statistics and graph calculus.

Let's take a basic ML algorithm, the linear regression.

The goal is to use some data to find a function which takes parameters and gives an output. Data are used to find the function and test it. In the future, we will use the function with some parameters and we will obtain an approximate output.

Let's say our data are about planes, as input we have the number of miles travelled by the plane and its age. As output we have the price of the plane. I don't normalize data to keep things simple.

A sample of our data could be :

miles;age;price

120000;12;120000

48000;4;1500000

...

Our question is : Given the miles travelled by a plane and its age, give a price.

Using linear regression (gradient descent) we will find a vector theta. This vector has two values, theta[0] and theta[1]. To find an approximate price we will multiply the miles by theta[0] and the age by theta[1] to obtain a result, which is an approximate price.

For instance our algorithm could find theta = [2;-10 000] and if we have a plane 5 years old with 78 000 miles, we can than approximate the price doing 78 000 * 2 + 5 * -10 000 so 106 000 dollars.

The hard part is to find the good values for theta. To do that you need some maths.

You have a cost function that give you how good your theta is, this cost function tests your theta values using your data (which already have a price for a plane regarding the miles and the age).

So your goal is to minimize the cost function by adjusting your theta values.

The cost function to minimize is this one :

where

Using the batch gradient descent algorithm each iteration adjusts the theta values using this formula :

then you test the theta value with the previous function J(theta) and you'll see that the cost (ie. the diff between the predicted value and the real) will decrease at each iteration.

As you can see this simple ML algorithm is math. The computer will be useful to compute the previous formulas.

Pankesh Bamotra
Mathematics is too vast a subject to be considered for this question. The breadth and depth of mathematical awareness you require for machine learning  totally depends on what you are learning in the subject. Keeping this in mind, let's deal with what you need to know in "mathematics" for  machine learning.

1. Probability and mathematical statistics  This is a fundamental requirement for machine learning and so you need to know well. When I say probability it's more than what you studied in High school and almost everything you probably not paid attention to during your undergrad. You need to know about Random variables, their distributions, probabilistic convergence, and estimation theory. That covers a major part of what you need to know here.
Two of my favourite resources are:-
1. Joseph Blitzstein - Harvard Stat 110 lectures
2. Larry Wasserman's book - All of statistics

2. Linear algebra
Linear algebra will pop up every now and then in ML. PCA, SVD, LU decomposition, QR decomposition, symmetric matrices, othogonalization, projections, matrix operations are needed many a times. The good thing is that there are countless resources available on linear algebra.
My all time favourite is Gilbert Strang's MIT lectures on linear algebra.

3. Optimisation
 Though only a few things from optimisation are needed most of the time, a strong foundational knowledge will help long way. You need to know Langrange multipliers, gradient descent, and primal-dual formulation. The best resource on this is Boyd and Vandenberghe's course on Convex optimisation from Stanford.

4. Calculus
I wanted to put this on the top, but I'm putting it in the last just to emphasise on the fact that only a fundamental knowledge is needed in terms of calculus. Know about 3-D geometry, integration, and differentiation and you'll survive. It's the easiest to start with amongst the topic I've mentioned here. MIT has good lectures on calculus.

I think with these 4 tools you'll most likely find ML easy to understand. Other than these you may find real analysis and functional analysis relevant too, but they are just formal generalisations of the topics mentioned before.
Arik Beremzon
From a beginner.

An introductory Linear Algebra course will generally include the following:

  • Vectors
  • Vector Spaces
  • Matrices
  • Inner Product Spaces
  • Orthogonality
  • Projection
  • Linear transformations
  • eigenvectors, eigenvalues
  • change of bases
  • Various decompositions: LU, Polar, SVD.

I also had some geometric algebra, but haven't found that useful so far.

Probability and statistics:

  • probabilities
  • combinations
  • permutations
  • distributions
  • Understanding of hypothesis testing
  • Descriptive statistics: Means, modes, standard deviations, variances etc.

If you can get through:

https://www.khanacademy.org/math...

And

https://www.khanacademy.org/math...

You are good to go.
Justin Rising
There's a recent CMU course called Computer Science Theory for the Information Age which includes a lot of the math for machine learning.  There's also a draft textbook there which is well worth grabbing a copy of.

The machine learning field needs the following mathematics background to understand more things.

      

Scott Triglia
If you are truly looking for a one-stop reference, the best that I can suggest is Chris Bishop's Pattern Recognition and Machine Learning (http://www.amazon.com/Pattern-Re...). Although it is quite difficult to start with, it will cover the majority of your interests until you are well versed enough in the subject to be able to read publications and more specific texts.

When in doubt, MIT OpenCourseWare is always a good source -- I believe they even offer one or two machine learning courses at the graduate level.

Good general reference/tutorial texts:
  • Information Theory, Inference and Learning Algorithms -- McKay
  • Introduction to Probability Models -- Ross
  • AI: A Modern Approach -- Russel & Norvig
  • Algorithms -- Kleinberg & Tardos
Fluff Miller
ML theory:
Christopher Bishop - Pattern Recognition & Machine Learning.   First time I picked this book up it was pretty daunting, but once you  get a bit of the maths under your belt, I found that it presents clearer  explanations than other texts.  I found it really clearly laid out and  it seems to progress pretty well.  It also covers a lot of stuff. 
Linear Algebra:
Gilbert Strang videos on linear algebra are excellent, so are the Khan academy ones.  The Gilbert Strang book doesn't seem to get particularly great reviews.  On the basis of reviews, I picked up a copy of Howard Anton's Elementary Linear Algebra which seems to be very highly regrarded.  I would recommend it. I also have David Poole: A Modern Introduction...which feels a bit more......modern than Anton and I have tended to use it more.  Doesn't seem to be a particularly well known book on t'internet, but I find it very clear (more so than Anton).
If you want to practise, then there's Schaums Outline of Theory and Problems of Linear Algebra.  (Good for practising but insufficient as a standalone text to the subject)

If you have the luxury of having some time before starting on Machine Learning, I would suggest really focusing on linear algebra in a very hands on way (working through structured examples) and getting a good understanding of orthogonality, vector spaces, eigenvectors, transformations.  From my experience, trying to learn the maths at the same time as learning Machine Learning was overwhelming and I would have got a lot more out of ML lectures if I had already got a grasp of the maths.
Daniel McLaury
You'll want to know calculus up to vector calculus, a first course in linear algebra, and a good course in calculus-based statistics that actually explains what the concepts mean (as opposed to "if you're trying to do this, you should press the chi-squared-test button" like you see in a lot of classes.)  A discrete math course would be nice just for background on notation, although you don't actually need to know any nontrivial discrete math.
Shehroz Khan

Mathematics for ML is no different from what you learn in high school or in under-grad studies. If you have that mathematics base, most of the time it is sufficient to understand what's going on in those creepy equations you see in books and research papers. However, sometimes more than that is required and you may have to take some advanced courses in statistics, calculus, linear algebra etc. You may also like to read in general more about How do I learn machine learning?

Yuval Feinstein
Please see How do I learn mathematics for machine learning? which has some good answers. I believe the Witten et. al. book is one of the most accessible introductions. I guess a basic book on statistics and probability and another one on Linear Algebra (For example Strang, 4th edition [1]) will take you most of the way there.
[1] http://math.mit.edu/linearalgebra/
Lucian Sasu
See the list of books from Learning about Machine Learning, 2nd Ed., or Michael Jordan's list: Mike Jordan at Berkeley sent me his list on what people should learn for ML. The...Of course, correlating with these lists does not necessarily imply causation :), but these are good starting points.
Darshan Hegde
Here is my view on, how to learn enough math online for free.

Teach yourself Machine Learning the hard way ! and follow up Teach yourself Machine Learning the hard way ! (Part 2)

It lists many pre-requisites that you need to understand and also some of the advanced stuff in part2.

I hope this helps.
Sukrit Shankar

With regards to mathematics for machine learning, I reckon all of the following skills are important:

(1) Some Basic Mathematical Skills (Linear Algebra, Probability, Optimization)

(2) Knowing how those mathematical skills are exploited for machine learning algorithms

(3) Developing a way to understand mathematics, so that any advanced maths for modern machine learning can be well comprehended.

While one would generally recommend all sorts of linear algebra and probability books for machine learning, I feel those are not always worth the time at least for machine learning. I would recommend following texts to read through (perhaps in order), which should cater to the above three mentioned points.

(a) Pattern Recognition and Machine Learning by Christopher Bishop (Will cater to 1 and 2 above)

(b) Deep Learning book by Goodfellow, Bengio and Courville (Will again strengthen 1 and build on 2)

(c) Understanding Machine Learning by Shai Shalev Shwartz and Shai Ben David (Will advance your skills in 1, strengthen 2, and give an insight to 3)

(d) Ankur Moitra’s rather short but useful book on Algorithmic aspects on Machine Learning (Will mainly cater to 3)

(e) Optimization for Machine Learning by Sra, Nowozin and Wright & Off the convex path by Sanjeev Arora and collaborators (Will cater to 3 and advance 1 and 2)

I truly believe if one can properly understand the above stuff in machine learning, he will develop all the Maths basics needed for machine learning, that too in a very connected form !! Hope this helps !!

Halil Lacevic

I won't say that you “learn” math. I would rather say that you train math.

Imagine you want to train boxing and your coach is teaching you directs, low kicks and high kicks. No matter how many times he shows you how to kick, you can't do it perfectly. You do know that it takes patience, hard work and effort to finally learn how to punch and you need to keep trying and training. After so many trys you can finally say that you can actually punch.

Whats the point?

Math is the same. Consider direct punches your formulas, low kick your theories and high kicks your solutions to problems. No matter how many formulas or theories you know, no matter how many times you've seen solutions you just can't do it perfectly. Why? Because you need to train those formulas, train those theories and knock out those problems with a damn good high kicks. And how do you do that?

  • Do as many problems as you can on a daily basis. It is not going to happen overnight, it takes time to train those kicks.

Wanna learn it fast? Better start now!

William Emmanuel Yu
Brush up on your statistics and probability. This is definitely critical particularly for supervised learning methods.

Some also require a good deal of number theory knowledge especially when discussing SVM, PCA and friends.

Since, you are planning to take a Ph.D. and move the science further you might want to narrow your focus to a particular area for your research while working with your candidate adviser.
This is not an exhaustive list of topics. Best read in this order:

  • Linear Algebra
  • Vector Calculus
  • Statistics and information theory
  • Discrete Math
  • Convex Optimization
  • Probabilistic Graphical models

I believe there is a book : http://www.amazon.com/All-Mathem... which can help you get a good head start.

I will try to keep this as concise as possible.

Edit: Somebody merged the original question to this question, so the premise becomes irrelevant.

To become a full stack AI/ML engineer, it is imperative that you have a complete grasp of the mathematical foundations of ML so that you can build upon concepts easily. The basic mathematical skills required are Linear Algebra, Matrix Algebra, Probability and some basic Calculus.

Linear Algebra

The best source to study Linear Algebra is Prof. Gilbert Strang’s Linear Algebra book/course. Video Lectures | Linear Algebra | Mathematics | MIT OpenCourseWare (MIT OCW). There are 34 lectures and believe me, they are completely worth it as after completing this, linear algebra should not pose any more problems for you. Solve some exercises/exams if you want to achieve mastery (recommended).

Matrix Algebra

Matrix algebra is an essential component of deep learning. I personally recommend this (Matrix Cookbook by Kaare Brandt Petersen & Michael Syskind Pedersen): http://www2.imm.dtu.dk/pubdb/vie... (PDF). There are 66 pages of pure matrix operations and this is the absolute “go-to” in case you are stuck trying to understand certain matrix manipulations that a researcher might have done.

Probability & Statistics

Understanding probability is a very important aspect of understanding ML. Some of the key probability concepts that you must be aware of include Bayes’ Theorem, distributions, MLE, regression, inference and so on. The best resource for this is Think Stats (Exploratory Data Analysis in Python) by Allen Downey: http://greenteapress.com/thinkst... (PDF). This absolute gem of a book is 264 pages long and covers all the aspects of probability and statistics that you need to understand with relevant Python code.

Optimization

The go-to book for Convex Optimization is Convex Optimization by Stephen Boyd and Lieven Vandenberghe: https://web.stanford.edu/~boyd/c... (PDF). This is a 730 page book and you need not read it all in one go. Choose the concept which you need to learn depending on your requirements and interest and read that part. It is complete and extremely well written. This book is free as part of the CVX 101 MOOC on EdX.

This 263 page book on metaheuristics, Essentials of Metaheuristics by Sean Luke (http://cs.gmu.edu/~sean/book/met... (PDF)) talks about gradient based optimization, policy optimization etc. and it is well written. One can choose to go through this also if interested.

Data science concepts are covered in the above topics. Other topics can be learnt by googling for sources easily as and when you encounter them. But complete understanding of the above should suffice for 95% of all scenarios.


Achieving mastery of the above topics will surely make you a mathematically strong AI/ML engineer. Now that you have built the foundation, start dipping your feet into research papers. They are absolutely essential as these clearly show the standards of AI researchers/engineers. Firstly, find out the famous papers of AI like RNN, LSTM, SVM etc. and go through the technical content.

Can you understand the jargon?

Can you understand the mathematics?

Can you implement the mathematics in code now without the help of overly sufficient libraries?

These are the key questions to be answered. Once you can answer “Yes/Mostly Yes” to these 3 questions, you are good to go.

After trying to read these papers dealing with the most popular concepts, try to read the not-so-famous papers. arXiv is a great site with hundreds of preprints being published everyday by top researchers and reading the papers from here is like drinking straight out of the fire-hose. Try to choose a paper which looks fairly well written and the abstract seems interesting. Then, read that paper and try to answer those 3 questions again. The same can be done with papers of top AI conferences like NIPS, AAAI, AAMAS, IJCAI, ICML etc. You may not be able to fully implement the papers due to data constraints and other issues, but if you are able to understand even 60% of the mathematical reasoning, then I can safely say you have completed your training.

Do not concentrate on learning more and more “packages”. Concentrate on the concept. While implementing, you will automatically see that you require “this” package and then you will automatically learn to use it. Learning the various commands of random packages won’t help. If you start implementing and writing codes to solve problems or simulate results from a paper, you will automatically learn about packages and use them appropriately; they’ll be the least of your concerns. This is the correct way to maintain “balance” between math and coding. You can also participate in competitions (e.g. Kaggle or conference competitions) to improve speed, development and processing skills if you feel the need to do so.

Alternatively, you can choose to pursue a doctoral degree (like me :P ) in AI/ML to gain a complete in-depth understanding of everything discussed here and more.

(All the links in this answer are working as of 6th July 2017)

Ivo Danihelka
The needed mathematics includes:

They will make your later reading much more pleasant. You will be able to devise your own proofs.

Terence Tao put multiple math-learning advices on his blog:
Krishna Kumar Sekar

I started writing the github awesome page for that ,it may help ,its having topics from basic machine learning maths to advanced and quantum machine learning

krishnakumarsekar/awesome-machine-learning-deep-learning-mathematics

Thanks and Regards

Krishna

krishnakumarsekar/awesome-quantum-machine-learning

Prasoon Goyal

A2A.

To have a basic mathematical background, you need to have some knowledge of the following mathematical concepts:
- Probability and statistics
- Linear algebra
- Optimization
- Multivariable calculus
- Functional analysis (not essential)
- First-order logic (not essential)
You can find some reasonable material on most of these by searching for "<topic> lecture notes" on Google. Usually, you'll find good lecture notes compiled by some professor teaching that course. The first few results should give you a good set to choose from.

For instance, here’s a list of some lecture notes that I just found:

Probability & Statistics : http://www2.aueb.gr/users/demos/...

Linear algebra : https://www.math.ku.edu/~lerner/...

Optimization : http://www.ifp.illinois.edu/~ang...

Calculus: https://www.math.wisc.edu/~angen...

Matrix Calculus : http://www.atmos.washington.edu/...

You should skim through these, without going into too much detail. You can come back to studying the topics as and when required while learning ML.

Ansup Babu

If you want to be a real Data Scientist Not the fake ones with skills of Analyst and not any mathematical intuition or point of view. Real Data Scientist Need to have very strong mathematical grounding.

So to learn Mathematics for ML this should be the order :-

  1. Start with probability ( Conditional Basic Marginal etc …)
  2. Mathematical Series and Convergence , Numerical methods for Analysis
  3. Matrix and Linear Algebra
  4. Bayesian Statistics
  5. Vectors ( Most Important)
  6. Calculus
  7. Markov Process and Chains
  8. Basics of Optimization ( Linear/ Quadratic)
  9. Advanced Matrix Algebras and Calculus ( Gradient , Divergence , Curls etc)

This much mathematics will enable the understanding behind the core ideas of ML and probabilistic algorithms,

You should pause now and start analysing certain Packages from Scratch in Python :

1. K-NN is great starting point learn it , and code it from scratch.

2. Logistic Regression with Gradient Descent.

Till now you can see the parameters and numbers moving in a matrix form , and understand the mathematics of prediction, And if you feel this is enough. Hold your breath. There is more exciting stuff to come. This will enable you to be a beginner of being a “Real Data Scientist”.

Next Start with :-

  1. Stochastic Models and Time Series Analysis
  2. Differential Equations
  3. Dynamic Programming and Optimization Techniques
  4. Fourier's and Wavelengths
  5. Random Fields
  6. Basic Knowledge of PDEs
  7. Techniques to solve PDEs using Monte-Carlo , Polynomial Expansions.

These mathematical techniques will help you visualize the model’s working and how to model and process raw data to create unique models whose functionality can be tuned. Parameters can be optimized for the problems and fine tuned with these techniques.

For a Next Level Up:- ( Statistics of Higher Dimensions)

  1. PDEs numerical solution with numerical input/ random input. ( fascinating subject to work on )
  2. Stochastic Differential Equations and Solutions
  3. PCA etc
  4. Dirichlet Processes, Markov Decision Process.
  5. Uncertainty Quantification - Polynomial Chaos, Projections on vector space

I think these are subject which one must learn to be a good Machine learning engineer in 21st century. with a knowledge base like this one can connect dots very rapidly and build systems and model of high accuracy.

( I am not a big fan of Neural nets,..so forgot to mention here)

Eugene Insall Jr
Linear Algebra is important in many ways, but you really need to learn some logic.  I don't mean the babiest of baby things that people say is easy because they can understand and track and foretell the end of a mystery novel in a tv series like Foyle's War or CSI or Sherlock Holmes.  I don't mean the intro to logic course in many philosophy departments.  I don't mean the Boolean circuits course you may have taken as a freshman in computer engineering or the simple truth table arguments you did and eventually turned in to graph theory problems in a second semester or second year computer science course called ``Discrete Math''.  I don't mean the simple arguments you went through in your modern algebra course as a senior in a mathematics department.  But all of those can be useful, and are pre-cursors or example generators for a beginning course in Model Theory.  Then you can begin to truly appreciate the NOTION of a THINKING MACHINE, and what it means to model such a monstrosity.  Then you can begin to understand how to develop formal languages for solution of specific problems.  Then you can start understanding why it's really strange to model THINKING as a neural network, although that is not a completely useless way to do it.  (Basically, neurel networks seem to me to be ``pattern recognizers'', roughly, basically using fixed-point iteration in metric spaces to hone in on a pattern or set of patterns of behaviors of inputting agents.  Please note that I said ``roughly''.  This is not intended to be a tutorial on neural networks.) 

Of course, it helps to have a notion of what it means to define or model the concept described by the verb ``to learn''.  That, my friends, is the realm of philosophy and pedagogy, but to apply it requires an understanding of the notion of a model, and we are back to my main point:  Take some model theory.  It's not likely to hurt you for more than a semester, and well... 

NO PAIN NO GAIN!!
Dan Dunay
In addition to Martin Thoma's great answer, I'd study up on the "Theory of Computation". Text books abound, but they are expensive. Search on the web. Wiki has an overview, but it's won't make sense until you've studied a bit. Still, it may show you what you've missed.
Carlin Eng
Bayes Theorem is a fundamental concept of probability that underpins many extremely important algorithms, from the very basic (e.g., Naive Bayes) to the quite complicated (e.g., Latent Dirichlet Allocation).

In linear algebra, a solid understanding of eigenvalues and eigenvectors is important for topics such as principal component analysis, factor analysis and other dimensionality reduction tasks.
Rafael Gustavo da Cunha Pereira Pinto
I would suggest reading as much Linear Algebra books as possible, followed by some probability and statistics texts.

For the first, I suggest Gilbert Strang's "Linear Algebra and Its Applications", while for the second, "Probability, Random Variables And Random Signal Principles" by Peebles is a good choice.

EDIT: A previous answer suggests Convex Optimization text, which I also recommend. A good text is "Convex Optimization" by Stephen Boyd, which is also available for free in the author's website.
I took both Andrew Ng's Machine Learning class and Sebastian Thrun's AI class. I liked the Machine Learning one more - even though AI class touches more topics, but it does it in a haphazard way. ML class is narrower, but more practical and focused. It helps to keep a link to Khan's linear algebra videos handy.

Shashank Gupta

For understanding Machine Learning you need following Mathematics prerequisites :

1. Probability and Statistics : Machine Learning has deep roots in Statistics. In fact the modern Machine learning is essentially Statistical learning i.e using stats to find patterns in data and inferring using them. So Stats and Probability are bare minimum for ML.

2. Linear Algebra : This is required because data is represented as matrix in Machine Learning and essentially all ML algorithms can be seen as Matrix manipulation in the end so basic understanding of Linear Algebra is required.

3. Optimization : Many people argue that Machine learning is a fancy name for optimization. While this is true to certain extent there is more to ML than optimization. But a large part of it is indeed optimization. In the end mostly all ML algos come down to some optimization task.

4. Calculus : This is a very useful tool for ML. Most ML algos rely on Differential Calculus to find solutions (Gradient Descent, Newton's method, quasi Newton's method etc.).

IMO if you master these topics than you can learn pretty much anything in ML, because all algorithms are essentially application of these tools in Ml.

Probability: Conditional probability, random variables, pdfs etc. Whatever you'd learn in your undergrad probability course and a bit more.
Some bit of stochastic processes (like markovian processes, etc)
Linear Algebra: Data analysis and machine learning builds up ALOT on these concepts.
Algorithms: not as important, but still important when it comes to optimizing your solution. some graph theory
Basic Linear Programming and recognizing convexity and relaxation.

I guess once you have a fair idea of most of these concepts, then its pretty simple to pick up the intuition behind any algorithm and where it would work/ how to improve it/ etc.
Nayan Gupta
It depends on your goals you want to achieve while learning ML.

Goal 1: To understand what is ML, how to apply different algorithms to the task, how interpret the output,common pitfalls etc : You will need to have a grasp on linear & matrix algebra,probability and optimization. You don't need to take a deep dive into each one, but study basic things like eigen vectors,conditional probability,distributions, bayes theorem.Additionally learn the concepts of overfitting, cross validation.

Resources
Video Lectures : Machine learning on coursera ( not only the  one by AndrewNg, there are few others as well)

Books : Machine Learning by Tom Mitchell. It includes the necessary linear algebra and probability too.

Goal2:Why the current algorithms are designed in a particular way, How do they fundamentally differ from each other: If you are more interested in the theoritical aspects like how kernels of  a support vector machine are defined, or how deep learning neural networks are designed or how to tweak the existing algorithms to make a new one,then you might want to extend your mathematics to functional analysis, topology , advanced optimization.

Resources : Video: Advanced Machine learning, Caltech (Prof. Mostafa's Lecture)
Books : Mining of massive datasets by ullman, Any good books on advanced linear algebra, topology but they wont connect it to machine learning.
Harsh Prasad

Learning mathematics is about doing. Remember the 80/20 Rule : You must study theory 20% of the time and practice/implement what you learn 80% of the time.

Here is a list of books you could use. You can find accompanying online courses for many of them.

1. Strang's Linear Algebra and its Applications

2. Apostol Calculus - Both the volumes

3. Golub's Matrix Computations

4. Sheldon Ross' Probability

5. Elements of Statistical Learning by Hastie et al

6. Bishop's Pattern Recognition and Machine Learning

7. David Barber's Bayesian Reasoning and Machine Learning

8. Kevin Murphy's Machine learning: a Probabilistic Perspective

9. Wasserman's All of Statistics and Non-parametric Statistics

From Hacker News :

1.) Casella, G. and Berger, R.L. (2001). "Statistical Inference" Duxbury Press.

2.) Ferguson, T. (1996). "A Course in Large Sample Theory" Chapman & Hall/CRC.

3.) Lehmann, E. (2004). "Elements of Large-Sample Theory" Springer.

4.) Gelman, A. et al. (2003). "Bayesian Data Analysis" Chapman & Hall/CRC.

5.) Robert, C. and Casella, G. (2005). "Monte Carlo Statistical Methods" Springer.

6.) Grimmett, G. and Stirzaker, D. (2001). "Probability and Random Processes" Oxford.

7.) Pollard, D. (2001). "A User's Guide to Measure Theoretic Probability" Cambridge.

8.) Bertsimas, D. and Tsitsiklis, J. (1997). "Introduction to Linear Optimization" Athena.

9.) Boyd, S. and Vandenberghe, L. (2004). "Convex Optimization" Cambridge.

10.) Golub, G., and Van Loan, C. (1996). "Matrix Computations" Johns Hopkins.

11.) Cover, T. and Thomas, J. "Elements of Information Theory" Wiley.

12.) Kreyszig, E. (1989). "Introductory Functional Analysis with Applications" Wiley.


Please do try implementing as many things as you can. Pick up a project. Talk to your peers and professors and people, see if you can help them with what you've learned. Do.

Meenakshi Deenadayalan

Some algorithms are really sweet they are available in Wikipedia with formulae, implementation and application.

Some dodge you till you watch two or three YouTube videos (Victor laverenko, Bert huang, udacity or MIT lectures)

Some are really mischievous, you got to do a lot of research, they test your patience and perseverance more than your mathematics!

And there are lots of books you can read.

How to learn a particular algorithm?

  1. First from the business point, learn why to use a algorithm and not any other counter part of its. Like why Fuzzy K-Means instead of K-Means.
  2. Secondly from an analyst point of view. Learn how to use the algorithm to solve some use cases. What is it meant to do.
  3. Last will be the mathematics. The how of the algorithm. And more research to enhance the algorithm and patent it.

P. S. It is normal to not understand in the first go.

P. P. S. And very normal to get totally confused in the second and third.

Michael Arthur Bucko
For the foundations of data crunching, from Microsoft, 2014, Page on microsoft.com. Thank you, Microsoft, for sharing.
Valeryia Shchutskaya

Hi, I work for a Data Science and AI company called InData Labs and on of our tech experts has recently prepared a short guide to learn neural networks, hope it is helpful for you:

A short guide to neural networks. Master them and become famous.

Mathematics is important part for learn machine learning. Necessary topics and useful resources of mathematics for machine learning?

Here i am sharing weightage of machine learning important mathematics topics and making your confusion very clear. So see below list and start preparation according it.

35% - Linear Algebra

25% - Probability Theory and Statistics

15% - Multivariate Calculus

15% - Algorithms and Complex Optimizations

10% - Others

Now i am taking forward my article in deep level so you get totally clearance to start machine learning or artificial intelligent .

  1. Linear Algebra: Topics such as Principal Component Analysis (PCA), Singular Value Decomposition (SVD), Eigendecomposition of a matrix, LU Decomposition, QR Decomposition/Factorization, Symmetric Matrices, Orthogonalization & Orthonormalization, Matrix Operations, Projections, Eigenvalues & Eigenvectors, Vector Spaces and Norms are needed for understanding the optimization methods used for machine learning. The amazing thing about Linear Algebra is that there are so many online resources.
  2. Probability Theory and Statistics:Probability Rules & Axioms, Bayes' Theorem, Random Variables, Variance and Expectation, Conditional and Joint Distributions, Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian), Moment Generating Functions, Maximum Likelihood Estimation (MLE), Prior and Posterior, Maximum a Posteriori Estimation (MAP) and Sampling Methods.
  3. Multivariate Calculus: topics include Differential and Integral Calculus, Partial Derivatives, Vector-Values Functions, Directional Gradient, Hessian, Jacobian, Laplacian and Lagragian Distribution.
  4. Algorithms and Complex Optimizations: Knowledge of data structures (Binary Trees, Hashing, Heap, Stack etc), Dynamic Programming, Randomized & Sublinear Algorithm, Graphs, Gradient/Stochastic Descents and Primal-Dual methods are needed.
  5. Others: This comprises of other Math topics not covered in the four major areas described above. They include Real and Complex Analysis (Sets and Sequences, Topology, Metric Spaces, Single-Valued and Continuous Functions, Limits, Cauchy Kernel, Fourier Transforms), Information Theory (Entropy, Information Gain), Function Spaces and Manifolds.

Now you are thinking and looking for best knowledge and practice resources for your week points right? Don’t worry learners i would also like to suggest some few good resources for it.

Thank you. Keep Learning.

Yin Zhu
Optmization. Especially convex optimization. E.g. gradient based methods for non-linear optimiztion (L-BFGS method and conjugate gradient), quadratic programming, etc.
Tyler Renelle

I made a podcast episode on the math you need for machine learning, and the resources for learning (if you like audio): Machine Learning Guide #8

Leonardo Federico
I would recommend to take this course: Course Introduction - Amazon Machine Learning

It covers most of the math you need to get started with machine learning.
Ankur Raj

There are many reasons why the mathematics of Machine Learning is important and I will highlight some of them below:

  • Selecting the right algorithm which includes giving considerations to accuracy, training time, model complexity, number of parameters and number of features.
  • Choosing parameter settings and validation strategies.
  • Identifying underfitting and overfitting by understanding the Bias-Variance tradeoff.
  • Estimating the right confidence interval and uncertainty.
  1. Linear algebra is a cornerstone because everything in machine learning is a vector or a matrix. Dot products, distance, matrix factorization, eigenvalues etc. come up all the time. Gilbert Strang’s linear algebra course i would recommend
  1. Multivariate Calculus: Some of the necessary topics include Differential and Integral Calculus, Partial Derivatives, Vector-Values Functions, Directional Gradient, Distribution.Differentiation matters because of gradient descent. Again, gradient descent is almost everywhere . some courses i recommend
  1. Probability Theory and Statistics: Machine Learning and Statistics aren't very different fields. Actually, someone recently defined Machine Learning as 'doing statistics on a Mac'. Some of the fundamental Statistical and Probability Theory needed for ML are Combinatorics, Probability Rules & Axioms, Bayes' Theorem, Random Variables, Variance and Expectation, Conditional and Joint Distributions, Standard Distributions (Bernoulli, Binomial, Multinomial, Uniform and Gaussian), Moment Generating Functions, Maximum Likelihood Estimation (MLE), Prior and Posterior, Maximum a Posteriori Estimation (MAP) and Sampling Methods.
  1. Algorithms and Complex Optimizations: This is important for understanding the computational efficiency and scalability of our Machine Learning Algorithm and for exploiting sparsity in our datasets. Knowledge of data structures (Binary Trees, Hashing, Heap, Stack etc), Dynamic Programming, Randomized & Sublinear Algorithm, Graphs, Gradient/Stochastic Descents and Primal-Dual methods are needed.
    1. Boyd and Vandenberghe's course on Convex optimization from Stanford.

Given all that , ML is not all about Maths and to frank Starting you will hardly spend 5% of your effort doing maths

Vaibhav Aparimit

I wrote a detailed medium post on this. You can read it here Math for Deep Learning is not Merlin’s Enchantment – Vaibhav Aparimit – Medium

First of, I really like your question. You seem to implicitly understand that math is an essential skill required to grasp the underpinnings of machine learning .

If your question was around deep learning, I would say linear algebra for 95% cases. In case of machine learning you would need to know probability ( especially bayes rules and conditional probability) , differential calculus and linear algebra( matrix multiplication, Eigen vectors , determinants , Hessians )

Hope this helps .

Colorado Reed
You may find Metacademy helpful when trying to understand the prereqs for various concepts in machine learning: Concepts - Metacademy
Prashant Sharma
You must have a sound understanding of at least the following (there might be others which are not there in this list):
  • Linear algebra
  • Calculus
  • Matrix calculus
  • Probability and statistics
  • Optimization - linear programming, convex optimization, non-linear optimization

Some other topics that are useful in specific sub-areas of machine learning are:
  • Basic graph theory
  • Basic algorithms
  • First-order logic
Nikhil Singhal
  1. Linear Algebra
  2. Probablity theory and statistics
  3. Multivariate calculus
  4. Algorithms and Complex optimizations
  5. Others- Real and Complex Analysis (Sets and Sequences, Topology, Metric Spaces, Single-Valued and Continuous Functions, Limits, Cauchy Kernel, Fourier Transforms), Information Theory (Entropy, Information Gain), Function Spaces and Manifolds.

To learn them go through

Alex Gilgur
Numerical Methods; Matrix and Tensor Algebra; Probability and Statistics; Operations Research; occasionally Calculus.
Charles H Martin
do this class online

EE364a: Convex Optimization I
As Alex mentions above, Andrew Ng's Course on Machine Learning is the best I have seen so far and he gives an intuitional feel for the concepts, so its easy to follow rather than looking at plain formulae in mathematics.
If you are looking to refresh/clarify linear algebra concepts after going through the above course, Khan Academy could be useful. It also has videos on other topics that might be of interest for machine learning. If you are looking for concepts like PCA etc., you might not find it here..
Another useful resource that can focus on concepts is video lectures.. Machine Learning - videolectures.net.. and search only for tutorials.
 
There is no one stop shop as the concepts can go deeper and might require special treatment.. All this is theoretical which can clarify concepts. However, if you are a novice and if you want a deep and intuitive feel for concepts then pick one simple problem and implement the solution.
Dan Zhang
Linear Algebra, Statistics, Discrete Math, Set Theory, etc.
mlthirst.wordpress.com. I hope it quenches your thirst.
Sarvesh Dhage
Hello friends i just came across a very interactive course on Understanding Machine Learning. This is a completely free video course. You just need to enroll using your id and password. I am sharing the link with you. Do enroll
Understanding Machine Learning with R - uFaber.com

posted on 2017-11-07 10:49  暖风的风  阅读(945)  评论(0编辑  收藏  举报

导航