Loading

Supervised Learning000

  • the process of supervised learning

  given dataset with labels then learn the hypothesis/predict function h: x->y through learning algorithm. When the target variable we try to predict is continuous, we call it regression problem. Whrn the target variable we try to predict is discrete, we call it classification problem.

 

 

  •  Linear regression

e.g. prediction of houses' price, give living area x1, number of bedrooms x2

hθ(x) = θ0 + θ1x+ θ2x2 

θi's are parameters/weight value, for simplicity, we can write in vectors as

h(x) = sum(θix)  = θTx

to get parameter θ we define the cost function:

J(θ) = 1/2sum(hθ(x(i)) - y(i))2

    • LMS algorithm (Least Mean Squares)
      • consider gradient descent algorithm(find local minimum), which starts with some initial θ, and repeatedly performs the update:(repeat until convergence)

        θj  := θ- α*∂J(θ)/∂θj

        ∂J(θ)/∂θ= (hθ(x) - y)*xj

    • The normal equations

    • Probabilistic interpretation( of least squares)
      •  Assume that the target variables and the inputs are related via the equation y(i) = θTx(i) + ε(i), where ε(i) is an error term that captures either unmodeled effects ( such as if there are some features very pertinent相关的特征to predict housing price, but that we'd left out of the regression), or random nosie.
      • further assume that the ε(i) are distributed IID( independently and identically distribution)accroding to Gaussian distribution( also called Normal distribution) . I.e., the density of  ε(i)  is given by

      •  Then, construct likelihood function:

         

         

      • Next, get log likelihood l(θ)

         

         

      • Hence, maximizing l(θ) gives the same answer as minimizing J(θ) below, the original least-square function

         

         


         

         

 

 

 

 

 

 

 

 

posted @ 2020-05-17 16:02  ArkiWang  阅读(156)  评论(0编辑  收藏  举报