机器学习--线性回归的实践

1.鉴于之前提到的房价的问题,使用线性回归该如何解决呢?

首先我们假设有如下的数据方便计算机进行学习

面积卧室价格
2140 3 400
1600 3 330
2400 3 369
1416 2 232
... ... ...

根据之前的演算过程(使房价与面积和卧室数目线性相关):

hθ(x)=θ0 +θ1x1 +θ2x2 

θ为计算时的权重,x1为房间面积,x2为我是数目。
为了降低计算的模糊程度,将hθ(x)变成h(x)来进行计算,这时计算公式为:

n为学习次数。


2. 有了相关数据之后就要开始训练算法了(ex1a_linreg.m

 1 <span style="font-size:24px;">%
 2 %This exercise uses a data from the UCI repository:
 3 % Bache, K. & Lichman, M. (2013). UCI Machine Learning Repository
 4 % http://archive.ics.uci.edu/ml
 5 % Irvine, CA: University of California, School of Information and Computer Science.
 6 %
 7 %Data created by:
 8 % Harrison, D. and Rubinfeld, D.L.
 9 % ''Hedonic prices and the demand for clean air''
10 % J. Environ. Economics & Management, vol.5, 81-102, 1978.
11 %
12 addpath ../common
13 addpath ../common/minFunc_2012/minFunc
14 addpath ../common/minFunc_2012/minFunc/compiled
15 
16 % Load housing data from file.
17 data = load('housing.data');  % housing data  506x14 
18 data=data'; % put examples in columns  14x506  一般这里将每一个样本放在每一列
19 
20 % Include a row of 1s as an additional intercept feature.
21 data = [ ones(1,size(data,2)); data ];  % 15x506    增加intercept term 
22 
23 % Shuffle examples. 乱序 目的在于之后能够随机选取training set和test sets
24 data = data(:, randperm(size(data,2))); %randperm(n)用于随机生成1到n的排列
25 
26 % Split into train and test sets
27 % The last row of 'data' is the median home price.
28 train.X = data(1:end-1,1:400);   %选择前400个样本来训练,后面的样本来做测试
29 train.y = data(end,1:400);
30 
31 test.X = data(1:end-1,401:end);
32 test.y = data(end,401:end);
33 
34 m=size(train.X,2);  %训练样本数量
35 n=size(train.X,1);  %每个样本的变量个数
36 
37 % Initialize the coefficient vector theta to random values.
38 theta = rand(n,1); %随机生成初始theta 每个值在(0,1)之间
39 
40 % Run the minFunc optimizer with linear_regression.m as the objective.
41 %
42 % TODO:  Implement the linear regression objective and gradient computations
43 % in linear_regression.m
44 %
45 tic; %Start a stopwatch timer. 开始计时
46 options = struct('MaxIter', 200);
47 theta = minFunc(@linear_regression, theta, options, train.X, train.y);
48 fprintf('Optimization took %f seconds.\n', toc); %toc Read the stopwatch timer
49 
50 % Run minFunc with linear_regression_vec.m as the objective.
51 %
52 % TODO:  Implement linear regression in linear_regression_vec.m
53 % using MATLAB's vectorization features to speed up your code.
54 % Compare the running time for your linear_regression.m and
55 % linear_regression_vec.m implementations.
56 %
57 % Uncomment the lines below to run your vectorized code.
58 %Re-initialize parameters
59 %theta = rand(n,1);
60 %tic;
61 %theta = minFunc(@linear_regression_vec, theta, options, train.X, train.y);
62 %fprintf('Optimization took %f seconds.\n', toc);
63 
64 % Plot predicted prices and actual prices from training set.
65 actual_prices = train.y;
66 predicted_prices = theta'*train.X;
67 
68 % Print out root-mean-squared (RMS) training error.平方根误差
69 train_rms=sqrt(mean((predicted_prices - actual_prices).^2));
70 fprintf('RMS training error: %f\n', train_rms);
71 
72 % Print out test RMS error
73 actual_prices = test.y;
74 predicted_prices = theta'*test.X;
75 test_rms=sqrt(mean((predicted_prices - actual_prices).^2));
76 fprintf('RMS testing error: %f\n', test_rms);
77 
78 
79 % Plot predictions on test data.
80 plot_prices=true;
81 if (plot_prices)
82   [actual_prices,I] = sort(actual_prices); %从小到大排序价格
83   predicted_prices=predicted_prices(I);
84   plot(actual_prices, 'rx');
85   hold on;
86   plot(predicted_prices,'bx');
87   legend('Actual Price', 'Predicted Price');
88   xlabel('House #');
89   ylabel('House price ($1000s)');
90 end</span>

 

3.为了保证运算的准确性我们需要对cost function函数进行运行,得到最小成本函数(本次练习时我添加的代码)

 1     % Step 1 : 计算成本函数
 2     for i = 1:m  
 3         f = f + (theta' * X(:,i) - y(i))^2;  
 4     end  
 5     f = 1/2*f;  
 6 
 7     % Step 2:计算gradient并储存在g中   
 8 
 9     for j = 1:n  
10         for i = 1:m  
11             g(j) = g(j) + X(j,i)*(theta' * X(:,i) - y(i));  
12     end  

 

posted on 2016-06-07 10:59  dinghing  阅读(976)  评论(0编辑  收藏  举报

导航