Linear Regression

1. Guide

                        

    Here, the x’s are two-dimensional vectors in R2.

    For instance, x(i)1 is the living area of the i-th house in the training set, and x(i)2 is its number of bedrooms.

    To perform supervised learning, wo must decide to choose h. As an initual choice, we decide to approximate y as a linear function of x:

                                           hθ(x) = θ0 + θ1x1 + θ2x2

    

    Here, the θi’s are the parameters (also called weights) parameterizing the space of linear functions mapping from X to Y.

    We always let hθ(x) as h(x), and x= 1, so that

                                        

    Here, θ and x are both n + 1 demensional vectors. n is the number of input variables (not counting x0).

    How to pick or learn the parameters θ?

    ---make h(x) close to θ, at least for the training examples we have.

    We define the cost function to formalize this:

                                                 

    ---this function is also called the least-squares cost function J.

 

2. LMS algorithm

    We want to choose θ so as to minimize J(θ). To do so, lets use a search algorithm that starts with some “initial guess” for θ, and that repeatedly changes θ to make J(θ) smaller, until hopefully we converge to a value of θ that minimizes J(θ). Specifically, lets consider the gradient descent algorithm, which starts with some initial θ, and repeatedly performs the update:

                                             

    Here, j = 0, 1,..., n. α is called the learning rate. So every time we use θ(k), x(i) and y(i) (i=1,2...,m) to update θ(k+1).

    First, if we have only one training example(x,y)(m = 1), we'll have 

                                            

    For a single training example, this gives the update rule:

                                        

    The rule is called the LMS update rule (LMS stands for “least mean squares”).

 

3. Batch gradient descent and stochastic gradient descent

    Batch gradient descent:

                                     

    This method looks at every example in the entire training set on every step, we update the parameters according to the gradient of the error with respect to all the  training examples.

    Note: gradient descent can be susceptible to local minima in general, but J is a convex quadratic function, so here exists only one globe minimum, no other local minima, thus gradient descent always converges (assuming the learning rate α is not too large) to the global minimum.

   Stochastic gradient descent:

                             

    In this algorithm, we repeatedly run through the training set, and each time we encounter a training example, we update the parameters according to the gradient of the error with respect to that single training example only.

    Compare: Whereas batch gradient descent has to scan through the entire training set before taking a single step—a costly operation if m is large—stochastic gradient descent can start making progress right away, and continues to make progress with each example it looks at. Often, stochastic gradient descent gets θ “close” to the minimum much faster than batch gradient descent. (Note however that it may never “converge” to the minimum, and the parameters θ will keep oscillating around the minimum of J(θ); but in practice most of the values near the minimum will be reasonably good approximations to the true minimum) For these reasons, particularly when the training set is large, stochastic gradient descent is often preferred over batch gradient descent.

    Note: 

  a. Batch gradient descent and stochastic gradient descent are both update synchronously.(θ(k+1) base on θ(k))

  b. While it is more common to run stochastic gradient descent as we have described it and with a fixed learning rate , by slowly letting the learning rate decrease to zero as the algorithm runs, it is also possible to ensure that the parameters will converge to the global minimum rather then merely oscillate around the minimum.

  

   

 

posted on 2013-04-12 09:46  BigPalm  阅读(186)  评论(0编辑  收藏  举报

导航