Deep Learning Week3 Notes
1. Perceptron
\(\text{If }\sum_iw_ix_i+b\ge 0\)
\(\text{Otherwise, } f(x)=0\)
\(\large \textbf{Perceptron Algorithm:}\)
- \(\text{Start with }w^0=0\)
- $\text{While }\exist n_k \text{ s.t. } y_{n_k}(w^k\cdot x_{n_k})\leq 0,\text{ update }w^{k+1} = w^k+y_{n_k}x_{n_k} $
\(\text{Code}\):
def train_perceptron(x, y, nb_epochs_max):
w = torch.zeros(x.size(1))
for e in range(nb_epochs_max):
nb_changes = 0
for i in range(x.size(0)):
### cross the batch_size
if x[i].dot(w) * y[i] <= 0:
w = w + y[i] * x[i]
nb_changes = nb_changes + 1
if nb_changes == 0: break;
return w
\(\text{We can get convergence under 2 assumptions:}\)
- \(\text{The }x_n\text{ are in the sphere of radius }R:\)
- \(\text{The two populations can be separated with a margin }\gamma\):
\(\large\text{The large the margin, the more quickly the algorithm classifies all the samples correctly.}\)
\(\textbf{SVM }\text{achieve this by minimizing:}\)
\(\textbf{which is convex and has a global optimum.}\)
\(\large\text{Hinge Loss:}\)
3. Probabilistic view of Linear Classifier
\(\text{Consider the following class populations:}\)
\(\text{where }\forall y\in\{0,1\} ,x\in\mathbb{R}^D\)
\(\text{We have:}\)
\(\text{where}\)
\(\text{So with our Gaussians }\mu_{X|Y=y}\text{ of the same }\Sigma,\text{ we get:}\)
3. Multi-layer Perceptrons
\(\textbf{Universal Approximation Theorem}\)
\(\text{A better appromixation requires a larger hidden layer, which means that we can make the }\textbf{training error }\text{as low as we want. It states nothing about the }\textbf{test error}.\)
4. Gradient Descent
\(\text{Consider logistic regression loss:}\)
\(\text{Therefore:}\)
\(\textbf{Note that:}\)
\(\textbf{Code}:\)
def gradient(x, y, w, b):
### bias's gradient
u = y * ( - y * (x @ w + b)).sigmoid()
### W's gradient
v = x * u.view(-1, 1) # Broadcasting
return - v.sum(0), - u.sum(0)
4. Forward and Backpropagation
\(\text{Forward:}\)
\(\text{Backward:}\)