R语言-逻辑回归

> ###############逻辑回归
> setwd("/Users/yaozhilin/Downloads/R_edu/data")
> accepts<-read.csv("accepts.csv")
> names(accepts)
 [1] "application_id" "account_number" "bad_ind"        "vehicle_year"   "vehicle_make"  
 [6] "bankruptcy_ind" "tot_derog"      "tot_tr"         "age_oldest_tr"  "tot_open_tr"   
[11] "tot_rev_tr"     "tot_rev_debt"   "tot_rev_line"   "rev_util"       "fico_score"    
[16] "purch_price"    "msrp"           "down_pyt"       "loan_term"      "loan_amt"      
[21] "ltv"            "tot_income"     "veh_mileage"    "used_ind"      
> accepts<-accepts[complete.cases(accepts),]
> select<-sample(1:nrow(accepts),length(accepts$application_id)*0.7)
> train<-accepts[select,]###70%用于建模
> test<-accepts[-select,]###30%用于检测
> attach(train)
> ###用glm(y~x,family=binomial(link="logit"))
> gl<-glm(bad_ind~fico_score,family=binomial(link = "logit"))
> summary(gl)

Call:
glm(formula = bad_ind ~ fico_score, family = binomial(link = "logit"))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.0794  -0.6790  -0.4937  -0.3073   2.6028  

Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept)  9.049667   0.629120   14.38   <2e-16 ***
fico_score  -0.015407   0.000938  -16.43   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2989.2  on 3046  degrees of freedom
Residual deviance: 2665.9  on 3045  degrees of freedom
AIC: 2669.9

Number of Fisher Scoring iterations: 5

多元逻辑回归

> ###多元逻辑回归
> gls<-glm(bad_ind~fico_score+bankruptcy_ind+age_oldest_tr+
+            tot_derog+rev_util+veh_mileage,family = binomial(link = "logit"))
> summary(gls)

Call:
glm(formula = bad_ind ~ fico_score + bankruptcy_ind + age_oldest_tr + 
    tot_derog + rev_util + veh_mileage, family = binomial(link = "logit"))

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.2646  -0.6743  -0.4647  -0.2630   2.8177  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)      8.205e+00  7.433e-01  11.039  < 2e-16 ***
fico_score      -1.338e-02  1.092e-03 -12.260  < 2e-16 ***
bankruptcy_indY -3.771e-01  1.855e-01  -2.033   0.0421 *  
age_oldest_tr   -4.458e-03  6.375e-04  -6.994 2.68e-12 ***
tot_derog        3.012e-02  1.552e-02   1.941   0.0523 .  
rev_util         3.763e-04  5.252e-04   0.717   0.4737    
veh_mileage      2.466e-06  1.381e-06   1.786   0.0741 .  
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 2989.2  on 3046  degrees of freedom
Residual deviance: 2601.4  on 3040  degrees of freedom
AIC: 2615.4

Number of Fisher Scoring iterations: 5

> glss<-step(gls,direction = "both")
Start:  AIC=2615.35
bad_ind ~ fico_score + bankruptcy_ind + age_oldest_tr + tot_derog + 
    rev_util + veh_mileage

                 Df Deviance    AIC
- rev_util        1   2601.9 2613.9
<none>                2601.3 2615.3
- veh_mileage     1   2604.4 2616.4
- tot_derog       1   2605.1 2617.1
- bankruptcy_ind  1   2605.7 2617.7
- age_oldest_tr   1   2655.9 2667.9
- fico_score      1   2763.8 2775.8

Step:  AIC=2613.88
bad_ind ~ fico_score + bankruptcy_ind + age_oldest_tr + tot_derog + 
    veh_mileage

                 Df Deviance    AIC
<none>                2601.9 2613.9
- veh_mileage     1   2604.9 2614.9
+ rev_util        1   2601.3 2615.3
- tot_derog       1   2605.7 2615.7
- bankruptcy_ind  1   2606.1 2616.1
- age_oldest_tr   1   2656.9 2666.9
- fico_score      1   2773.2 2783.2

> #出来的数据是logit，我们需要转换
> train$pre<-predict(glss,train)
> #出来的数据是logit，我们需要转换
> train$pre<-predict(glss,train)
> summary(train$pre)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
 -4.868  -2.421  -1.671  -1.713  -1.011   2.497 
> train$pre_p<-1/(1+exp(-1*train$pre))
> summary(train$pre_p)
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
0.00763 0.08157 0.15823 0.19298 0.26677 0.92395

1 > #逻辑回归不需要检测扰动项，但需要检测共线性
2 > library(car)
3 > vif(glss)
4     fico_score bankruptcy_ind  age_oldest_tr      tot_derog    veh_mileage 
5       1.271283       1.144846       1.075603       1.423850       1.003616

posted @ 2020-11-04 14:06 瑶池里阅读(296) 评论(0) 编辑收藏举报

刷新页面返回顶部

瑶池里

R语言-逻辑回归

公告