Logistic Regression：银行贷款申请审批实例

问题定义

这是一个贷款的审批问题，假设你是一个银行的贷款审批员，现在有客户需要一定额度的贷款，他们填写了个人的信息（信息在datas.txt中给出），你需要根据他们的信息，建立一个分类模型，判断是否可以给他们贷款。

请根据所给的信息，建立分类模型，评价模型，同时将模型建立过程简单介绍一下，同时对各特征进行简单的解释说明。

Dataset

用户id，年龄，性别，申请金额，职业类型，教育程度，婚姻状态，房屋类型，户口类型，贷款用途，公司类型，薪水，贷款标记：0不放贷，1同意放贷

Data preprocessing

在对数据进行建模时，用户ID是没有用的。在描述用户信息的几个维度数据中，年龄，申请金额，薪水是连续值，剩下的是离散值。

通过观察发现有些数据存在数据缺失的情况，需要对这些数据进行处理，比如直接删除或者通过缺失值补全。

The Logit Function

The Logistic Regression

Model Data

 1 #逻辑回归模型
 2 #对银行客户是否放贷进行分类
 3 
 4 import pandas
 5 import numpy
 6 import matplotlib.pyplot as plt
 7 from sklearn.linear_model import  LogisticRegression
 8 from sklearn.metrics import roc_curve, roc_auc_score
 9 
10 data = pandas.read_csv("datas.csv")
11 data = data.dropna()
12 
13 # Randomly shuffle our data for the training and test set
14 admissions = data.loc[numpy.random.permutation(data.index)]
15 
16 # train with 700 and test with the following 300, split dataset
17 num_train = 14968
18 data_train = admissions[:num_train]
19 data_test = admissions[num_train:]
20 
21 # Fit Logistic regression to admit with features using the training set
22 logistic_model = LogisticRegression()
23 logistic_model.fit(data_train[['Age','Gender','AppAmount','Occupation',
24                                'Education','Marital','Property','Residence',
25                                'LoanUse','Company','Salary']], data_train['Label'])
26 
27 # Print the Models Coefficients
28 print(logistic_model.coef_)
29 
30 # .predict() using a threshold of 0.50 by default
31 predicted = logistic_model.predict(data_train[['Age','Gender','AppAmount','Occupation',
32                                'Education','Marital','Property','Residence',
33                                'LoanUse','Company','Salary']])
34 
35 # The average of the binary array will give us the accuracy
36 accuracy_train = (predicted == data_train['Label']).mean()
37 
38 # Print the accuracy
39 print("Accuracy in Training Set = {s}".format(s=accuracy_train))
40 
41 # Predicted to be admitted
42 predicted = logistic_model.predict(data_test[['Age','Gender','AppAmount','Occupation',
43                                'Education','Marital','Property','Residence',
44                                'LoanUse','Company','Salary']])
45 
46 # What proportion of our predictions were true
47 accuracy_test = (predicted == data_test['Label']).mean()
48 print("Accuracy in Test Set = {s}".format(s=accuracy_test))
49 
50 
51 # Predict the chance of label from those in the training set
52 train_probs = logistic_model.predict_proba(data_train[['Age','Gender','AppAmount','Occupation',
53                                'Education','Marital','Property','Residence',
54                                'LoanUse','Company','Salary']])[:,1]
55 
56 test_probs = logistic_model.predict_proba(data_test[['Age','Gender','AppAmount','Occupation',
57                                'Education','Marital','Property','Residence',
58                                'LoanUse','Company','Salary']])[:,1]
59 
60 # Compute auc for training set
61 auc_train = roc_auc_score(data_train["Label"], train_probs)
62 
63 # Compute auc for test set
64 auc_test = roc_auc_score(data_test["Label"], test_probs)
65 
66 # Difference in auc values
67 auc_diff = auc_train - auc_test
68 
69 # Compute ROC Curves
70 roc_train = roc_curve(data_train["Label"], train_probs)
71 roc_test = roc_curve(data_test["Label"], test_probs)
72 
73 # Plot false positives by true positives
74 plt.plot(roc_train[0], roc_train[1])
75 plt.plot(roc_test[0], roc_test[1])

posted @ 2016-08-29 16:00 Black_Knight 阅读(671) 评论(0) 编辑收藏举报

会员力量，点亮园子希望

刷新页面返回顶部

Black_Knight