李宏毅老师机器学习第二课classification

1.Classification

classification:   x->function->class n

how to do classification?

train data for classification:

  (x1,y^1(x2,y^2 (x3,y^3) (x4,y^4)

ideal alternatives:

*function (model):

  x->g(x)->g(x)>0------->class 1

     ->g(x)<0-------->class 2

*loss function

      L(f)=∑δ(f(xn)!=y^n)      the number of times f get incorrect results on training data

*find the best function

 example:perceptron,svm

2.Gaussian distribution

*Gaussian distribution  fuction      fu,Σ(x)=(2π)-1/2Σ-1/2exp(-1/2(x-u)TΣ-1(x-u))

input vector x     output:probability of sampling x

the shape of the function determines by vector mean u and covariance matrix Σ

*maxinum likeihood

the Gaussian with any mean u and covariance matrix Σ can generate these point but with different likehood

likehood of a Gaussian with mean u and covariance matrix Σ = the probability of the Gaussion sample x1,x2,x3.....xn

loss function       L(u,Σ)=fu,Σ(x1)fu,Σ(x2)fu,Σ(x3).......fu,Σ(x4)

find best parameters    u*,Σ*=argmaxL(u,Σ)     u*=1/n∑xi     Σ*=1/n∑(xi-u*)(xi-u*)T

*classification with Gaussion distribution

Naive Bayes     P(c1|x)=P(x|c1)P(c1)/P(x|c2)P(c2)+P(x|c1)P(c1)

 P(x|c1):fuc1c1(x)             P(x|c2):fuc2c2(x)

*Modifying model

use different uc1,uc2,but use the same Σc1, Σc2,due to less parameters, Σ parameters number proportional to (x parameter)2

Modifying           ∑new=(m/m+n)∑c1+(n/m+n)∑c2

*model flaw

use Naive Bayes classifier,all the dimensions are independent

*posterior probability:

P(c1|x)=P(x|c1)P(c1)/P(x|c1)P(c1)+P(x|c2)P(c2)=1/1+P(x|c2)P(c2)/P(x|c1)P(c1)=1/1+exp(-z)=σ(z)=sigmod(z)

z=ln(P(x|c1)P(c1)/P(x|c2)P(c2))

*mathematical derivation

z=wx+b

3.Logistic Regression

Pw,b(c1|x)=σ(z)    z=ln(P(x|c1)P(c1)/P(x|c2)P(c2))=wx+b  σ(z)=1/1+exp(-z)

*step1  function set:     fw,b(x)=Pw,b(c1|x)

*step 2 loss function of Logistic Regression

train data    x      x1 x2 x3 x4.....xn                     x1 x2 x3 x4.....xn

                   y^    c1 c c1 c1...... c2       ——>      1   0    1    1 ......0

Assume the data is generated based on fw,b(x)=Pw,b(c1|x)

L(w,b)=fw,b(x1)(1-fw,b(x2))fw,b(x3)fw,b(x4).....(1-fw,b(xn))

L(w,b)=Πfw,b(xi)    w*,b*=argmaxL(w,b)=argmin(-lnL(w,b))

-lnL(w,b)=-lnfw,b(x1)-ln(1-fw,b(x2))-lnfw,b(x3)-lnfw,b(x4)........-ln(1-fw,b(xn))

              =∑-(y^lnfw,b(xi)+(1-y^)(ln(1-fw,b(xi))))     cross entropy between two Bernoulli distribution

*step3find the best function

δlnfw,b(xn)/δwi=(1-σ(z))xi

δln(1-fw,b(xn))/δwi=-σ(z)

δlnL(w,b)/δwi=∑-(y^n-fw,b(xn))xin

4.Multi-class classification

*softmax

c1:w1,b1     z1=w1+b1       ——> ez1/∑ezj

c2:w2,b2      z2=w2+b2       ——>ez2/∑ezj

c3:w3,b3      z3=w3+b     ——>ez2/∑ezj

softmax     zi——>ezi/∑ezi

probability of softmax:  0<yi<1   ∑yi=1

     ——>z1 ——>softmax——>y1      loss fuction    y^1=[1 0 0]T

x   ——>z2 ——>softmax——>y2     <————>   y^2=[1 0 0]T

     ——>z3 ——>softmax——>y3      -∑y^ilnyi         y^3=[1 0 0]T

*once Logistic Regression can transformat feature

*cascading logistic regression models

x1 ——>z1——>softmax——>x1'

                                                        ——>z3——>softmax——>y

x2 ——>z2——>softmax——>x2'

           feature transformat          Neual                 classification

 

 

posted on 2020-11-05 20:02  真正的小明被占用了  阅读(183)  评论(0编辑  收藏  举报

导航