Machine Learning
Machine Learning
Linear Regression
-
hypothesis: \(h_\theta(x)=\sum_{i=0}^{m}\theta_ix_i\), where \(x_0=1\)
-
Cost Function: \(J(\theta)=\frac{1}{2}\sum_{i=1}^{n}(h_{\theta}(x)-y)^2=\frac{1}{2}(X{\theta}-Y)^T(X{\theta}-Y)\)
-
Two methods for minimizing \(J(\theta)\):
(1) Close-form Solution: \({\theta}=(X^{T}X)^{-1}X^{T}Y\)
(2) Gradient Descent: repeat \({\theta}:={\theta}-\alpha\frac{\partial}{\theta}J(\theta)={\theta}-\alpha\sum_{i=1}^{n}(h_{\theta}(x^{(i)})-y^{(i)})x^{(i)}={\theta}-{\alpha}{X^T}(X{\theta}-Y)\)
Normalize the data in order to accelerate gradient descent: \(x:=(x-\mu)/(max-min)\) or \(x:=(x- min)/(max-min)\)
-
python code for the question
(1) close-form solution:
from numpy import*; import numpy as np; X = [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013]; XX = [1,1,1,1,1,1,1,1,1,1,1,1,1,1]; Y = [2.000, 2.500, 2.900, 3.147, 4.515, 4.903, 5.365, 5.704, 6.853, 7.971, 8.561, 10.000, 11.280, 12.900]; xx=X; yy=Y; X = mat([XX,X]); Y = mat(Y); XT = X.T; tmp = X*XT; tmp=tmp.I; theta = tmp*X; theta = theta*Y.T; print(theta); theta0 = theta[0][0]; theta1 = theta[1][0]; print(2014*theta1+theta0);
(2) gradient descent:
from numpy import *; import numpy as np; def getsum(theta,X,Y): return X*(theta.T*X-Y).T; X = [2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013]; X = mat([mat(ones(1,14)),X]); Y = [2.000, 2.500, 2.900, 3.147, 4.515, 4.903, 5.365, 5.704, 6.853, 7.971, 8.561, 10.000, 11.280, 12.900]; alpha = 0.01; theta = mat(zeros(2,1)); X /= 2000; Y /= 12; las = 0; while true: theta -= alpha*getsum(theta,X,Y); if(abs(las-theta[0][0])<=1e-6): break; las = theta[0][0]; print(theta); print(theta[1][0]*2014+theta[0][0]);
Logistic Regression
Binary Classification
Hypothesis:
Define \(\delta(x)=\frac{1}{1+e^{-x}}\)
\(h_{\theta}(x)=\delta({\theta}^{T}x)=\frac{1}{1+e^{-{\theta}^{T}x}}\)
Actually, \(h_{\theta}(x)\) can be seen as the probility of y to be equal to 1, that is, \(p(y=1|x,\theta)\)
Cost Function:
Gradient descent to minimize \(J(\theta)\)
repeat: $${\theta}:={\theta}-{\alpha}\sum_{i=1}{n}(h_{\theta}(x)-y{(i)})x$$