3 Favorite Machine Learning Papers
3 Favorite Machine Learning Papers
Read papers, seriously?? Let me tell you, my inbox has design docs marked unread since last quarter, I have recurring invite to the tuesday ML discussion group that I never find time to go to and I have mailed myself 17 “interesting” articles that I am yet to click through. So why should I read some random papers rather than figuring out where my mapreduce to copy the logs is failing?
Well, because 1) these papers are broadly applicable and has a fair chance to influence your product 2) are not loaded with random greek alphabets and useless theorems 3) it’s unlikely you’ll come up with all of it by yourself.
Paper #1 “A Few Useful Things To Know About Machine Learning” Pedro Domingos, 2012.
If you have been working on machine learning for a little while and are familiar with the basics then this paper delivers great value. It brings together a lot of insights that you may be beginning to feel, but are not sure of. For example, you may see that there are as many ML algorithms as there are ML PhDs in the world, but for your problem it doesn’t matter which variation you pick. When it works, all of them work within few percentages of each other and when they don’t, all of them fail simultaneously. This paper articulates a lot of these kind of observations in a proper way (“More data beats a cleverer algorithm”).
Paper #2 “Machine Learning: The High-Interest Credit Card of Technical Debt” Googlers, 2014.
This is more of a machine learning system design paper. This is a hard paper because a lot of the things it talks about are hard to appreciate unless you have made the same mistakes yourself. Still it’s highly recommended because once you are forewarned, it’s easier to spot and accept your mistake. It’s very likely that you’ll be coming back to this paper multiple times.
Paper #3 “Efficient Estimation of Word Representations in Vector Space” Mikolov et al., 2013.
The bulk of applied machine learning is feature engineering. This paper is interesting because it talks about one of the ultimate feature engineering tricks: converting sparse features to dense features. This tutorial based on the paper is also a great place to start. The consequences of this is far reaching, including better generalization of your sparse features and getting more out of your precious labeled data by augmenting it with unlabeled data.
Embedding is closely related to matrix factorization, which in turn is related to collaborative filtering, as this paper explains (Note:this is additional info and good to know after mastering the embedding technique).
Okay, now you have three more interesting links to bookmark for future reading :-)