logistic回归

(原创文章,转载请注明出处!)

用logistic回归来解决分类问题。

模型的值域是[0,1],用0.5作为分类的阈值。

模型的输出是:P(y=1|x;θ),即:对给定的输入x,和确定的参数θ,事件“y=1”的概率。

那么可以选择sigmoid函数: 1/(1+e-z) ,z∈R,值域为[0,1],在logistic回归中 z=θTx。(也可以选择其他函数)

 即: P(y=1|x;θ) = 1/(1+e-z),其中z=θTx 。

 

学习目标函数(代价函数)的选择

目标函数会使用最优化的算法来进行训练,求解最优解。

凸函数在最优化时,局部最优化解就是全局最优化解,方便求解,所以在选择目标函数时,考虑选择一个凸函数。选择如下的损失函数:

对y=1, -log[1/(1+e-z)] ;  对y=0,  -log[1-(1/(1+e-z))];(对数损失函数)。这两个函数在sigmoid函数的值域上都是单调凸函数。

将两个损失函数合并起来:L(x,y;θ) = -ylog[1/(1+e-z)] - (1-y)log[1-(1/(1+e-z))]        

而训练数据是由多个样本点组成的,所以训练数据的平均损失为:

L(x,y;θ) = -(1/m)∑i=1m{y(i)log[1/(1+e-z)] + (1-y(i))log[1-(1/(1+e-z))]}, z=θTx(i), m是样本点的个数

(logistic回归模型参数的估计问题,本质上是点估计,可以使用极大似然估计来(MLE)估计模型中的参数。

似然函数写成:L = ∏i=1m{y(i)[1/(1+e-z)] + (1-y(i))[1 - (1/(1+e-z))]}, z=θTx(i),

对数似然函数写成:L =∑i=1m{y(i)log[1/(1+e-z)] + (1-y(i))log[1-(1/(1+e-z))]}, z=θTx(i)  )

 

使用样本的平均损失来代替总体分布的期望损失,由大数定律知,在样本点充分多的情况下是可行的。

如果样本点不是充分的多怎么办?这个时候还使用平均损失代替期望损失就可能出现overfitting。

解决的方法是正则化,给平均损失函数增加惩罚项:

L(x,y;θ) = -(1/m)∑i=1m{y(i)log[1/(1+e-z)] + (1-y(i))log[1-(1/(1+e-z))]} + (λ/2m)[∑j=1nj2)], n是参数个数。

惩罚项是在代价函数中加入的模型复杂度(比如:参数多)带来的惩罚,用以将指定的参数将来训练得到的值减小(减小参数对模型的影响),以此来控制模型的复杂度,避免overfitting。(除了上面平均损失函数中添加的2-范数惩罚项,还有其它形式的惩罚项,如:1-范数惩罚项)

代价函数中的第一项来最小化模型误差,第二项来控制模型复杂度带来的影响,系数λ≥0,用来平衡代价函数中的第一项和第二项;另外样本点的增加能减少惩罚项带来的影响(m ↑,(λ/2m) ↓),而受惩罚的参数将会变大(如果将惩罚项系数分母中的m去掉,则就消除这种影响)。

 

可以使用梯度下降,或者牛顿法来求解此最优化问题。采用牛顿法,参数的计算公式为:

Θ = Θ - H-1Θ,其中H-1是平均损失函数对Θ的Hessian矩阵的逆矩阵,∇Θ是平均损失函数对Θ的梯度向量。

 

使用R编程实现的logistic回归算法。

  1 ##
  2 ##Function: logistic_fun_sigmoid
  3 logistic_fun_sigmoid <- function (z) 
  4 {
  5     h_theta <- 1.0/(1.0 + exp((-1)*z))
  6     h_theta
  7 }
  8 ##
  9 ##Function: logistic_fun_map_feature
 10 logistic_fun_map_feature <- function(u, v, degree)
 11 {
 12     result <- matrix(1, nrow = length(u), ncol=1)    
 13     for (i in 1:degree) {
 14         for (j in i:0) {
 15             result <- cbind(result, u^j * v^(i-j))
 16         }
 17     }
 18     result
 19 }
 20 
 21 ##
 22 ## load data
 23 ##
 24 logisticx <- read.table(file="x.dat", sep=",")
 25 attr(logisticx, "names") <- c("u", "v")
 26 logisticy <- read.table(file="y.dat")
 27 
 28 logistic_x <- as.matrix(logisticx) ;  attr(logistic_x,"dimnames")[2] <- NULL
 29 logistic_y <- as.matrix(logisticy) ;  attr(logistic_y,"dimnames")[2] <- NULL
 30 logistic_y_factor <- as.factor( logistic_y )
 31 
 32 ## plot the samples' data
 33 plot( x = c(floor(min(logistic_x[,1])), ceiling(max(logistic_x[,1]))),
 34       y = c(floor(min(logistic_x[,2])), ceiling(max(logistic_x[,2]))),
 35       xlab = "u", ylab = "v", type = "n")  # setting up coord. system
 36 points(logistic_x[,1], logistic_x[,2], col = as.integer(logistic_y)+1, pch = as.integer(logistic_y)+1)
 37 legend("topright", legend = c("y=1", "y=0"), col = c(2, 1), pch = c(2,1))
 38 
 39 ##
 40 ## map original x to 28-feature vector
 41 ##
 42 logistic_x_mapped <- logistic_fun_map_feature( logistic_x[,1], logistic_x[,2], 6 )
 43 
 44 
 45 logistic_lamda                  <-     0
 46 logistic_num_of_samples         <-     dim(logistic_x_mapped)[1]
 47 logistic_num_of_features        <-     dim(logistic_x_mapped)[2]
 48 logistic_theta                  <-     matrix(0, nrow=logistic_num_of_features, ncol=1 )   # 28 rows, 1 column
 49 logistic_h_theta                <-     logistic_fun_sigmoid( logistic_x_mapped %*% logistic_theta )
 50 logistic_gradient_of_J_theta    <-     numeric(logistic_num_of_features)
 51 logistic_diag_28by28            <-     diag( c(0, rep(1, logistic_num_of_features - 1)) )  # logistic_diag_28by28[1,1] = 0
 52 logistic_H                      <-     matrix(0, nrow=logistic_num_of_features, ncol=logistic_num_of_features) # hessian matrix
 53 logistic_J_theta                <-     0
 54 logistic_J_theta_array          <-     numeric(1)
 55 logistic_last_J_theta           <-     -1
 56 logistic_num_of_loop            <-     0
 57 logistic_theta_list             <-     list(logistic_theta, logistic_theta, logistic_theta)
 58 logistic_lamda_vec              <-     c(0, 1, 10)
 59 
 60 
 61 for (index_i in 1:length(logistic_theta_list) )  {
 62     logistic_lamda <- logistic_lamda_vec[index_i]  #  0, 1, 10   
 63     
 64     logistic_num_of_samples         <-     dim(logistic_x_mapped)[1]
 65     logistic_num_of_features        <-     dim(logistic_x_mapped)[2]
 66     logistic_theta                  <-     matrix(0, nrow=logistic_num_of_features, ncol=1 )   # 28 rows, 1 column
 67     logistic_h_theta                <-     logistic_fun_sigmoid( logistic_x_mapped %*% logistic_theta )
 68     logistic_gradient_of_J_theta    <-     numeric(logistic_num_of_features)
 69     logistic_diag_28by28            <-     diag( c(0, rep(1, logistic_num_of_features - 1)) )  # logistic_diag_28by28[1,1] = 0
 70     logistic_H                      <-     matrix(0, nrow=logistic_num_of_features, ncol=logistic_num_of_features) # hessian matrix
 71     logistic_J_theta                <-     0
 72     logistic_J_theta_array          <-     numeric(1)
 73     logistic_last_J_theta           <-     -1
 74     logistic_num_of_loop            <-     0
 75 
 76     
 77     while (logistic_J_theta != logistic_last_J_theta) {
 78         logistic_num_of_loop <- logistic_num_of_loop + 1
 79         cat("----------------------------------------------------------loop ", logistic_num_of_loop, " begin \n")
 80         logistic_last_J_theta <- logistic_J_theta
 81         
 82         ## calculate gradient of the J of theta
 83         logistic_gradient_of_J_theta[1] <-  crossprod( (logistic_h_theta - logistic_y[,1]), logistic_x_mapped[,1] )   /   logistic_num_of_samples
 84         for (i in 2:logistic_num_of_features) {
 85             logistic_gradient_of_J_theta[i] <-  ( crossprod( (logistic_h_theta - logistic_y[,1]), logistic_x_mapped[,i] )
 86                                               + logistic_theta[i,1] * logistic_lamda          )  /  logistic_num_of_samples
 87         }
 88         #print("logistic_gradient_of_J_theta") ; browser()
 89 
 90         ## calculate Hessian matrix
 91         for (i in 1:logistic_num_of_samples) {
 92             logistic_H <- logistic_h_theta[i,1]*(1-logistic_h_theta[i,1])*tcrossprod( as.matrix(logistic_x_mapped[i,]) )  + logistic_H
 93         }
 94         logistic_H <- (logistic_H + logistic_lamda * logistic_diag_28by28) / logistic_num_of_samples
 95         
 96         ## update theta
 97         logistic_theta <- logistic_theta - solve(logistic_H) %*% as.matrix(logistic_gradient_of_J_theta)
 98         
 99         
100         ## update h of theta with the latest theta
101         logistic_h_theta <- logistic_fun_sigmoid( logistic_x_mapped %*% logistic_theta )
102         
103         ## calculate J of theta
104         logistic_J_theta <- ( 
105                           sum( as.vector(log(logistic_h_theta)) * as.vector(logistic_y)
106                                + as.vector(log(1- logistic_h_theta)) * as.vector((1 - logistic_y))  )
107                           + sum(logistic_theta^2) * logistic_lamda / 2 
108                         )     /     logistic_num_of_samples
109         
110         logistic_J_theta_array[logistic_num_of_loop] <- logistic_J_theta
111         cat("----------------------------------------------------------loop ", logistic_num_of_loop, " end \n")
112     }
113     logistic_theta_list[[index_i]] <- logistic_theta
114 }
115 
116 
117 ##
118 ## plot graphic
119 ##
120 
121 ## one_theta is 28-by-1 matrix
122 logistic_plot_contour <- function(one_theta) 
123 {
124     u <- seq(-1, 1.5, by=.05)
125     v <- seq(-1, 1.5, by=.05)
126     z <- matrix(0, nrow=length(u), ncol=length(v))
127     
128     for (i in 1:length(u)) {
129         for (j in 1:length(v)) {
130             z[j,i] <- logistic_fun_map_feature( u[i], v[j], 6 ) %*% one_theta
131         }
132     }
133     
134     contour(u, v, t(z), col = "green", add = TRUE, method = "simple", nlevels=1, lty="solid")
135 }
136 
137 op <- par( mfrow = c(1, 3), # 1 x 3 pictures on one plot
138            pty = "s" ) 
139 for (index_i in 1:length(logistic_theta_list) )  {
140     plot( x = c(-1, 1.5),
141           y = c(-1, 1.5),
142           xlab = "u", ylab = "v", type = "n")  # setting up coord. system
143     points(logistic_x[,1], logistic_x[,2], col = as.integer(logistic_y)+1, pch = as.integer(logistic_y)+1)
144     legend("topright", legend = c("y=1", "y=0","Decision boundary"), col = c(2, 1, "green" ), pch = c(2,1,"-"))
145     
146     logistic_theta  <- logistic_theta_list[[index_i]]
147     logistic_plot_contour(logistic_theta)    
148 }
149 par(op)
View Code

 

posted @ 2014-08-07 22:36  activeshj  阅读(1187)  评论(0编辑  收藏  举报