Machine Learning No.2: Linear Regression with Multiple Variables
1. notation:
n = number of features
x(i) = input (features) of ith training example
= value of feature j in ith training example
2. Hypothesis:
3. Cost function:
4. Gradient descent:
Repeat {
}
substituting cost function, then
Repeat {
(simultaneously update θj for j = 0, ... n)
}
5. Mean normalization
replace xi with xi - µi to make features have approximately zero mean(Do not apply to x0 = 1).
ex: x_1 = (x_1 - u_1) / s_1
6. Declare convergence if J(θ) decreases by less than 10^-3 in one iteration.
if α is too small: slow convergence.
if α is too large: J(θ) may not decrease on every iteration; may not converge
7. normal equation
Octave: pinv(X'*X)*X'*y
8. comparation between gradient descent and normal equation
Gradient Descent: need to choose α
needs many iterations
works well even when n is large
Normal Equation: No need to choose α
Don't need to iterate
need to compute pinv(X'X)
slow if n is very large
9. Some problems
what if X'T is non-invertible?
Redundant features(linearly dependent)
E.g. x1 = size in feet^2
x2 = size in m^2
Too many features(e.g. m <= n)
Delete some features, or use regularization