微信扫一扫打赏支持

Tensorflow2(预课程)---4.1、逻辑回归实例-层方式

Tensorflow2(预课程)---4.1、逻辑回归实例-层方式

一、总结

一句话总结:

可以看到,相比于mse损失函数,cross entropy函数无论是收敛速度,还是最后的测试集的准确率都更加优秀
# 构建容器
model = tf.keras.Sequential()
# 输出层
model.add(tf.keras.Input(shape=(15,)))
# 中间层
model.add(tf.keras.layers.Dense(10,activation='relu'))
model.add(tf.keras.layers.Dense(10,activation='relu'))
# 输出层
model.add(tf.keras.layers.Dense(1,activation='sigmoid'))


# 配置优化函数和损失器
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['acc'])
# 开始训练
history = model.fit(train_x,train_y,epochs=5000,validation_data=(test_x,test_y))

 

 

 

二、逻辑回归实例-层方式

博客对应课程的视频位置:

 

 

步骤

1、读取数据集
2、拆分数据集(拆分成训练数据集和测试数据集)
3、构建模型
4、训练模型
5、检验模型

需求

对credit数据集进行预测,也就是根据已有数据,来预测后面的情况
In [1]:
import pandas as pd
import numpy as np
import tensorflow as tf
import matplotlib.pyplot as plt

1、读取数据集

In [2]:
data=pd.read_csv("credit.csv",header=None)
data
Out[2]:
 0123456789101112131415
0 0 30.83 0.000 0 0 9 0 1.25 0 0 1 1 0 202 0.0 -1
1 1 58.67 4.460 0 0 8 1 3.04 0 0 6 1 0 43 560.0 -1
2 1 24.50 0.500 0 0 8 1 1.50 0 1 0 1 0 280 824.0 -1
3 0 27.83 1.540 0 0 9 0 3.75 0 0 5 0 0 100 3.0 -1
4 0 20.17 5.625 0 0 9 0 1.71 0 1 0 1 2 120 0.0 -1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
648 0 21.08 10.085 1 1 11 1 1.25 1 1 0 1 0 260 0.0 1
649 1 22.67 0.750 0 0 0 0 2.00 1 0 2 0 0 200 394.0 1
650 1 25.25 13.500 1 1 13 7 2.00 1 0 1 0 0 200 1.0 1
651 0 17.92 0.205 0 0 12 0 0.04 1 1 0 1 0 280 750.0 1
652 0 35.00 3.375 0 0 0 1 8.29 1 1 0 0 0 0 0.0 1

653 rows × 16 columns

这种-1和1的二分类问题,完全不需要one_hot编码

直接输出层激活函数sigmoid即可,因为sigmoid的y的取值是0-1

2、拆分数据集(拆分成训练数据集和测试数据集)

In [3]:
#先打乱数据
data=data.sample(frac=1.0,random_state=116)#打乱所有数据
data=data.reset_index(drop=True) #打乱后的数据index也是乱的,用reset_index重新加一列index,drop=True表示丢弃原有index一列
data
Out[3]:
 0123456789101112131415
0 1 17.83 11.000 0 0 10 1 1.000 0 0 11 1 0 0 3000.0 -1
1 1 22.50 11.000 1 1 8 0 3.000 0 1 0 0 0 268 0.0 1
2 0 47.33 6.500 0 0 0 0 1.000 1 1 0 0 0 0 228.0 1
3 0 31.08 1.500 1 1 9 0 0.040 1 1 0 1 2 160 0.0 1
4 0 38.17 10.125 0 0 10 0 2.500 0 0 6 1 0 520 196.0 -1
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
648 0 16.50 0.125 0 0 0 0 0.165 1 1 0 1 0 132 0.0 1
649 1 41.00 2.040 1 1 8 1 0.125 0 0 23 0 0 455 1236.0 -1
650 0 23.92 0.665 0 0 0 0 0.165 1 1 0 1 0 100 0.0 -1
651 0 31.08 3.085 0 0 0 0 2.500 1 0 2 0 0 160 41.0 1
652 1 33.67 0.375 0 0 2 0 0.375 1 1 0 1 0 300 44.0 -1

653 rows × 16 columns

前面600做训练,后面53做测试

In [4]:
train_x=data.iloc[:600,:-1]
# 把-1用0来表示
train_y=data.iloc[:600,-1].replace(-1,0)
print(train_x)
print(train_y)
test_x=data.iloc[600:,:-1]
# 把-1用0来表示
test_y=data.iloc[600:,-1].replace(-1,0)
     0      1       2   3   4   5   6      7   8   9   10  11  12   13      14
0     1  17.83  11.000   0   0  10   1  1.000   0   0  11   1   0    0  3000.0
1     1  22.50  11.000   1   1   8   0  3.000   0   1   0   0   0  268     0.0
2     0  47.33   6.500   0   0   0   0  1.000   1   1   0   0   0    0   228.0
3     0  31.08   1.500   1   1   9   0  0.040   1   1   0   1   2  160     0.0
4     0  38.17  10.125   0   0  10   0  2.500   0   0   6   1   0  520   196.0
..   ..    ...     ...  ..  ..  ..  ..    ...  ..  ..  ..  ..  ..  ...     ...
595   1  64.08   0.165   0   0  13   7  0.000   0   0   1   1   0  232   100.0
596   0  29.83   2.040   1   1  10   1  0.040   1   1   0   1   0  128     1.0
597   0  34.50   4.040   1   1   3   2  8.500   0   0   7   0   0  195     0.0
598   0  36.33   3.790   0   0   9   0  1.165   0   1   0   0   0  200     0.0
599   0  22.67   1.585   1   1   9   0  3.085   0   0   6   1   0   80     0.0

[600 rows x 15 columns]
0      0
1      1
2      1
3      1
4      0
      ..
595    0
596    1
597    0
598    1
599    0
Name: 15, Length: 600, dtype: int64

3、构建模型

输入是15维,输出1维,输出二分类问题,不需要one_hot

输出换成0和1,这样输出层激活函数指定sigmoid,就不需要做其它操作了

sigmoid输出的本身就是0-1之间的概率,softmax也是表示的概率

所以模型应该是15->n->1

In [5]:
# 构建容器
model = tf.keras.Sequential()
# 输出层
model.add(tf.keras.Input(shape=(15,)))
# 中间层
model.add(tf.keras.layers.Dense(10,activation='relu'))
model.add(tf.keras.layers.Dense(10,activation='relu'))
# 输出层
model.add(tf.keras.layers.Dense(1,activation='sigmoid'))
# 模型的结构
model.summary()
Model: "sequential"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dense (Dense)                (None, 10)                160       
_________________________________________________________________
dense_1 (Dense)              (None, 10)                110       
_________________________________________________________________
dense_2 (Dense)              (None, 1)                 11        
=================================================================
Total params: 281
Trainable params: 281
Non-trainable params: 0
_________________________________________________________________

对于特征比较多的神经网络,第一层的神经元最好也多一点,这样效果要好

当第一层神经元的个数为4的时候,5000次的epoch的成功率只有0.5+,

而第一层神经元的个数为10的时候,5000次的epoch的成功率有8.3+,

但是也不能太多

4、训练模型

In [6]:
# 配置优化函数和损失器
model.compile(optimizer='adam',loss='binary_crossentropy',metrics=['acc'])
# 开始训练
history = model.fit(train_x,train_y,epochs=5000,validation_data=(test_x,test_y))
Epoch 1/5000
19/19 [==============================] - 0s 9ms/step - loss: 17.5357 - acc: 0.4467 - val_loss: 9.5537 - val_acc: 0.4717
Epoch 2/5000
19/19 [==============================] - 0s 3ms/step - loss: 8.2702 - acc: 0.5667 - val_loss: 3.3894 - val_acc: 0.8113
Epoch 3/5000
19/19 [==============================] - 0s 3ms/step - loss: 2.7807 - acc: 0.6450 - val_loss: 1.2214 - val_acc: 0.7925
Epoch 4/5000
19/19 [==============================] - 0s 3ms/step - loss: 1.8072 - acc: 0.6667 - val_loss: 0.8503 - val_acc: 0.7358
Epoch 5/5000
19/19 [==============================] - 0s 3ms/step - loss: 1.2854 - acc: 0.6333 - val_loss: 0.8329 - val_acc: 0.7736
Epoch 6/5000
19/19 [==============================] - 0s 3ms/step - loss: 1.0678 - acc: 0.6250 - val_loss: 0.7051 - val_acc: 0.7358
Epoch 7/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.8744 - acc: 0.6350 - val_loss: 0.7378 - val_acc: 0.6981
Epoch 8/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.7846 - acc: 0.6433 - val_loss: 0.6449 - val_acc: 0.7170
Epoch 9/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.7270 - acc: 0.6633 - val_loss: 0.5953 - val_acc: 0.7170
Epoch 10/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.7340 - acc: 0.6750 - val_loss: 0.7433 - val_acc: 0.8113
Epoch 11/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.8346 - acc: 0.6850 - val_loss: 0.6981 - val_acc: 0.8113
Epoch 12/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.7294 - acc: 0.6717 - val_loss: 0.6034 - val_acc: 0.8113
Epoch 13/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.6561 - acc: 0.6917 - val_loss: 0.5421 - val_acc: 0.7170
Epoch 14/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.6122 - acc: 0.6833 - val_loss: 0.4921 - val_acc: 0.7736
Epoch 15/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.6230 - acc: 0.7033 - val_loss: 0.4789 - val_acc: 0.8113
Epoch 16/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.5656 - acc: 0.7100 - val_loss: 0.4925 - val_acc: 0.8113
Epoch 17/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.7055 - acc: 0.6950 - val_loss: 0.6540 - val_acc: 0.8113
Epoch 18/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.6596 - acc: 0.7083 - val_loss: 0.4843 - val_acc: 0.7736
Epoch 19/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.6000 - acc: 0.7067 - val_loss: 0.4473 - val_acc: 0.7925
Epoch 20/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.6648 - acc: 0.7067 - val_loss: 0.4772 - val_acc: 0.7358
Epoch 21/5000
19/19 [==============================] - 0s 3ms/step - loss: 2.0961 - acc: 0.6333 - val_loss: 2.5110 - val_acc: 0.7925
Epoch 22/5000
19/19 [==============================] - 0s 3ms/step - loss: 2.3280 - acc: 0.7033 - val_loss: 0.8230 - val_acc: 0.7358
Epoch 23/5000
19/19 [==============================] - 0s 3ms/step - loss: 1.0403 - acc: 0.6950 - val_loss: 0.8077 - val_acc: 0.7925
..........................
Epoch 4995/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.1757 - acc: 0.9217 - val_loss: 2.0568 - val_acc: 0.8113
Epoch 4996/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.1735 - acc: 0.9233 - val_loss: 2.0542 - val_acc: 0.8113
Epoch 4997/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.1749 - acc: 0.9250 - val_loss: 2.1591 - val_acc: 0.8113
Epoch 4998/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.1788 - acc: 0.9267 - val_loss: 2.2617 - val_acc: 0.8302
Epoch 4999/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.1826 - acc: 0.9283 - val_loss: 2.3343 - val_acc: 0.7925
Epoch 5000/5000
19/19 [==============================] - 0s 3ms/step - loss: 0.1716 - acc: 0.9267 - val_loss: 2.3872 - val_acc: 0.8113
In [7]:
plt.plot(history.epoch,history.history.get('loss'))
plt.title("train data loss")
plt.show()
In [8]:
plt.plot(history.epoch,history.history.get('val_loss'))
plt.title("test data loss")
plt.show()
In [9]:
plt.plot(history.epoch,history.history.get('acc'))
plt.title("train data acc")
plt.show()
In [10]:
plt.plot(history.epoch,history.history.get('val_acc'))
plt.title("test data acc")
plt.show()
可以看到,相比于mse损失函数,cross entropy函数无论是收敛速度,还是最后的测试集的准确率都更加优秀

5、检验模型

In [11]:
pridict_y=model.predict(test_x)
print(pridict_y)
print(test_y)
[[3.8453022e-21]
 [8.5991585e-01]
 [6.5455580e-10]
 [1.0000000e+00]
 [1.0000000e+00]
 [1.0000000e+00]
 [9.5815730e-01]
 [6.3400120e-02]
 [9.4178385e-01]
 [9.8131865e-01]
 [1.0000000e+00]
 [2.5722989e-01]
 [9.6723849e-01]
 [3.5510752e-01]
 [2.1926977e-03]
 [3.4676839e-03]
 [4.1911963e-02]
 [4.1010199e-07]
 [8.6172342e-01]
 [7.0969802e-01]
 [7.7412263e-02]
 [7.5796050e-01]
 [1.0000000e+00]
 [0.0000000e+00]
 [2.3604299e-01]
 [1.7066821e-02]
 [1.0000000e+00]
 [9.6950209e-01]
 [2.4396995e-01]
 [2.1174717e-01]
 [9.6668184e-01]
 [9.6301866e-01]
 [2.1353349e-02]
 [1.0000000e+00]
 [4.1701463e-01]
 [1.0000000e+00]
 [2.8559173e-04]
 [7.4790055e-01]
 [1.0000000e+00]
 [8.7733787e-01]
 [4.8705450e-01]
 [2.5270924e-01]
 [1.0000000e+00]
 [9.7821814e-01]
 [7.8488225e-01]
 [1.0000000e+00]
 [3.1834045e-01]
 [1.4321352e-06]
 [1.0000000e+00]
 [1.1380038e-01]
 [1.0000000e+00]
 [9.3310910e-01]
 [9.6864295e-01]]
600    0
601    1
602    0
603    1
604    1
605    1
606    1
607    0
608    1
609    1
610    1
611    1
612    1
613    1
614    0
615    0
616    0
617    0
618    1
619    1
620    0
621    1
622    1
623    0
624    0
625    0
626    1
627    1
628    1
629    0
630    1
631    1
632    0
633    0
634    1
635    1
636    0
637    0
638    0
639    1
640    0
641    0
642    1
643    1
644    1
645    1
646    1
647    0
648    1
649    0
650    0
651    1
652    0
Name: 15, dtype: int64
In [12]:
test_y=np.array(test_y)
pridict_y=pridict_y.flatten()
print(test_y)
print(pridict_y)
[0 1 0 1 1 1 1 0 1 1 1 1 1 1 0 0 0 0 1 1 0 1 1 0 0 0 1 1 1 0 1 1 0 0 1 1 0
 0 0 1 0 0 1 1 1 1 1 0 1 0 0 1 0]
[3.8453022e-21 8.5991585e-01 6.5455580e-10 1.0000000e+00 1.0000000e+00
 1.0000000e+00 9.5815730e-01 6.3400120e-02 9.4178385e-01 9.8131865e-01
 1.0000000e+00 2.5722989e-01 9.6723849e-01 3.5510752e-01 2.1926977e-03
 3.4676839e-03 4.1911963e-02 4.1010199e-07 8.6172342e-01 7.0969802e-01
 7.7412263e-02 7.5796050e-01 1.0000000e+00 0.0000000e+00 2.3604299e-01
 1.7066821e-02 1.0000000e+00 9.6950209e-01 2.4396995e-01 2.1174717e-01
 9.6668184e-01 9.6301866e-01 2.1353349e-02 1.0000000e+00 4.1701463e-01
 1.0000000e+00 2.8559173e-04 7.4790055e-01 1.0000000e+00 8.7733787e-01
 4.8705450e-01 2.5270924e-01 1.0000000e+00 9.7821814e-01 7.8488225e-01
 1.0000000e+00 3.1834045e-01 1.4321352e-06 1.0000000e+00 1.1380038e-01
 1.0000000e+00 9.3310910e-01 9.6864295e-01]

需求:

tensorflow 大于某个值为1,小于某个值为0

In [13]:
print(tf.where(pridict_y>0.5,x=1,y=0))
print(test_y)
tf.Tensor(
[0 1 0 1 1 1 1 0 1 1 1 0 1 0 0 0 0 0 1 1 0 1 1 0 0 0 1 1 0 0 1 1 0 1 0 1 0
 1 1 1 0 0 1 1 1 1 0 0 1 0 1 1 1], shape=(53,), dtype=int32)
[0 1 0 1 1 1 1 0 1 1 1 1 1 1 0 0 0 0 1 1 0 1 1 0 0 0 1 1 1 0 1 1 0 0 1 1 0
 0 0 1 0 0 1 1 1 1 1 0 1 0 0 1 0]
In [ ]:
 

 

 
posted @ 2020-09-15 16:47  范仁义  阅读(201)  评论(0编辑  收藏  举报