微信扫一扫打赏支持

机器学习200725系列---2、逻辑回归实例

机器学习200725系列---2、逻辑回归实例

一、总结

一句话总结:

lr=Pipeline([("sc",StandardScaler()),("cLf",LogisticRegression())])

 

 

1、logistic回归在什么情况下是一个线性函数?

是对数函数的情况下,可以公式推导

 

 

2、matplotlib画伪色图 注意点?

1、x轴参数:plt.xlim(x1_min,x1_max)
2、保存图片:plt.savefig("3.png")
cm_light=mpl.colors.ListedColormap(["#77E0A0","#FF8080","#A0A0FF"])
cm_dark=mpl.colors.ListedColormap(['b','g','r'])

y_hat=lr.predict(x_test) # 预测值
y_hat=y_hat.reshape(x1.shape) #使预测值 与 输入的形状相同
plt.pcolormesh(x1,x2,y_hat,cmap=cm_light) # 预测值得显示
plt.scatter(x.iloc[:,0],x.iloc[:,1],c=y,edgecolors="k",s=50,cmap = cm_dark) # 样本的显示

#样本展示
plt.xlabel("petal length")
plt.ylabel("petal width")
plt.xlim(x1_min,x1_max)
plt.ylim(x2_min,x2_max)
plt.grid()
# 保存图片
plt.savefig("3.png")
plt.show()

 

 

3、numpy的meshgrid方法:x1,x2=np.meshgrid(t1,t2)?

相当于把t1从1维重复到了2维,把一行数据变成了n行

 

4、numpy的stack方法?

合并数据:Join a sequence of arrays along a new axis
# help(np.stack)
# Join a sequence of arrays along a new axis.
'''
    >>> a = np.array([1, 2, 3])
    >>> b = np.array([2, 3, 4])
    >>> np.stack((a, b))
    array([[1, 2, 3],
           [2, 3, 4]])
'''

 

 

 

5、梯度的物理意义?

找到一个方向,然后不断的朝这个方向走

 

 

 

二、逻辑回归实例

博客对应课程的视频位置:

 

In [1]:
import numpy as np #科学分析库
from sklearn.linear_model import LogisticRegression #scikit learning 算法库
import matplotlib.pyplot as plt #可视化工具pyplot()绘图函数
import matplotlib as mpl #可视化库
from sklearn import preprocessing #数据预处理
import pandas as pd #把csv txt格式的数据转换成表格
from sklearn.preprocessing import StandardScaler #特征工程中的标准化
from sklearn.model_selection import train_test_split#分割数据集
from sklearn.pipeline import Pipeline #管道
In [3]:
#一、加载数据集 pd.read_table()
#二、分割数据集 train_test_split(x,y,test_size =1/4,random_stant=0)
#三、选择/建立模型 mode1=LogisticRegression(C=)
#四、训练模型 model.fit(x_train,y_train)
#五、验证模型 model.predict(x_test)

一、加载数据集 pd.read_table()

In [9]:
data = pd.read_csv('./iris.data',header=None)
# data.head() 前5行
data
Out[9]:
 01234
0 5.1 3.5 1.4 0.2 Iris-setosa
1 4.9 3.0 1.4 0.2 Iris-setosa
2 4.7 3.2 1.3 0.2 Iris-setosa
3 4.6 3.1 1.5 0.2 Iris-setosa
4 5.0 3.6 1.4 0.2 Iris-setosa
... ... ... ... ... ...
145 6.7 3.0 5.2 2.3 Iris-virginica
146 6.3 2.5 5.0 1.9 Iris-virginica
147 6.5 3.0 5.2 2.0 Iris-virginica
148 6.2 3.4 5.4 2.3 Iris-virginica
149 5.9 3.0 5.1 1.8 Iris-virginica

150 rows × 5 columns

二、分割数据集 train_test_split(x,y,test_size =1/4,random_stant=0)

In [16]:
x = data.iloc[:,0:2] # 为了可视化,仅用前2列特征
y = data.iloc[:,-1].replace({"Iris-setosa":0,"Iris-versicolor":1,"Iris-virginica":2})
print(x)
print(y)
# y = data.iloc[:,-1].replace(-1,0)
       0    1
0    5.1  3.5
1    4.9  3.0
2    4.7  3.2
3    4.6  3.1
4    5.0  3.6
..   ...  ...
145  6.7  3.0
146  6.3  2.5
147  6.5  3.0
148  6.2  3.4
149  5.9  3.0

[150 rows x 2 columns]
0      0
1      0
2      0
3      0
4      0
      ..
145    2
146    2
147    2
148    2
149    2
Name: 4, Length: 150, dtype: int64
In [ ]:
# 现在数据画图
In [19]:
x_train,x_test,y_train,y_test=train_test_split(x,y,random_state=1,train_size=0.75)
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)
(112, 2)
(112,)
(38, 2)
(38,)

三、选择/建立模型 model=LogisticRegression(C=)

In [21]:
lr=Pipeline([("sc",StandardScaler()),
             ("cLf",LogisticRegression())])

四、训练模型 model.fit(x_train,y_train)

In [26]:
print(y.ravel())
# 将垂直数据化成扁平数据即可
# Return the flattened underlying data as an ndarray.
lr.fit(x,y.ravel())
[0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 2 2 2 2 2 2 2 2
 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
 2 2]
Out[26]:
Pipeline(memory=None,
         steps=[('sc',
                 StandardScaler(copy=True, with_mean=True, with_std=True)),
                ('cLf',
                 LogisticRegression(C=1.0, class_weight=None, dual=False,
                                    fit_intercept=True, intercept_scaling=1,
                                    l1_ratio=None, max_iter=100,
                                    multi_class='auto', n_jobs=None,
                                    penalty='l2', random_state=None,
                                    solver='lbfgs', tol=0.0001, verbose=0,
                                    warm_start=False))],
         verbose=False)

画图

In [30]:
print(x)
       0    1
0    5.1  3.5
1    4.9  3.0
2    4.7  3.2
3    4.6  3.1
4    5.0  3.6
..   ...  ...
145  6.7  3.0
146  6.3  2.5
147  6.5  3.0
148  6.2  3.4
149  5.9  3.0

[150 rows x 2 columns]
In [29]:
# 画图
N,M=500,500
x1_min,x1_max=x.iloc[:,0].min(),x.iloc[:,0].max()#第0列的范围
x2_min,x2_max=x.iloc[:,1].min(),x.iloc[:,1].max()#第1列的范围
print(x1_min,x1_max)
print(x2_min,x2_max)
4.3 7.9
2.0 4.4
In [33]:
# 从x1_min 到 x1_max 生成 N 个点
t1=np.linspace(x1_min,x1_max,N)
t2=np.linspace(x2_min,x2_max,M)
print(t1)
[4.3        4.30721443 4.31442886 4.32164329 4.32885772 4.33607214
 4.34328657 4.350501   4.35771543 4.36492986 4.37214429 4.37935872
 4.38657315 4.39378758 4.401002   4.40821643 4.41543086 4.42264529
 4.42985972 4.43707415 4.44428858 4.45150301 4.45871743 4.46593186
 4.47314629 4.48036072 4.48757515 4.49478958 4.50200401 4.50921844
 4.51643287 4.52364729 4.53086172 4.53807615 4.54529058 4.55250501
 4.55971944 4.56693387 4.5741483  4.58136273 4.58857715 4.59579158
 4.60300601 4.61022044 4.61743487 4.6246493  4.63186373 4.63907816
 4.64629259 4.65350701 4.66072144 4.66793587 4.6751503  4.68236473
 4.68957916 4.69679359 4.70400802 4.71122244 4.71843687 4.7256513
 4.73286573 4.74008016 4.74729459 4.75450902 4.76172345 4.76893788
 4.7761523  4.78336673 4.79058116 4.79779559 4.80501002 4.81222445
 4.81943888 4.82665331 4.83386774 4.84108216 4.84829659 4.85551102
 4.86272545 4.86993988 4.87715431 4.88436874 4.89158317 4.8987976
 4.90601202 4.91322645 4.92044088 4.92765531 4.93486974 4.94208417
 4.9492986  4.95651303 4.96372745 4.97094188 4.97815631 4.98537074
 4.99258517 4.9997996  5.00701403 5.01422846 5.02144289 5.02865731
 5.03587174 5.04308617 5.0503006  5.05751503 5.06472946 5.07194389
 5.07915832 5.08637275 5.09358717 5.1008016  5.10801603 5.11523046
 5.12244489 5.12965932 5.13687375 5.14408818 5.15130261 5.15851703
 5.16573146 5.17294589 5.18016032 5.18737475 5.19458918 5.20180361
 5.20901804 5.21623246 5.22344689 5.23066132 5.23787575 5.24509018
 5.25230461 5.25951904 5.26673347 5.2739479  5.28116232 5.28837675
 5.29559118 5.30280561 5.31002004 5.31723447 5.3244489  5.33166333
 5.33887776 5.34609218 5.35330661 5.36052104 5.36773547 5.3749499
 5.38216433 5.38937876 5.39659319 5.40380762 5.41102204 5.41823647
 5.4254509  5.43266533 5.43987976 5.44709419 5.45430862 5.46152305
 5.46873747 5.4759519  5.48316633 5.49038076 5.49759519 5.50480962
 5.51202405 5.51923848 5.52645291 5.53366733 5.54088176 5.54809619
 5.55531062 5.56252505 5.56973948 5.57695391 5.58416834 5.59138277
 5.59859719 5.60581162 5.61302605 5.62024048 5.62745491 5.63466934
 5.64188377 5.6490982  5.65631263 5.66352705 5.67074148 5.67795591
 5.68517034 5.69238477 5.6995992  5.70681363 5.71402806 5.72124248
 5.72845691 5.73567134 5.74288577 5.7501002  5.75731463 5.76452906
 5.77174349 5.77895792 5.78617234 5.79338677 5.8006012  5.80781563
 5.81503006 5.82224449 5.82945892 5.83667335 5.84388778 5.8511022
 5.85831663 5.86553106 5.87274549 5.87995992 5.88717435 5.89438878
 5.90160321 5.90881764 5.91603206 5.92324649 5.93046092 5.93767535
 5.94488978 5.95210421 5.95931864 5.96653307 5.97374749 5.98096192
 5.98817635 5.99539078 6.00260521 6.00981964 6.01703407 6.0242485
 6.03146293 6.03867735 6.04589178 6.05310621 6.06032064 6.06753507
 6.0747495  6.08196393 6.08917836 6.09639279 6.10360721 6.11082164
 6.11803607 6.1252505  6.13246493 6.13967936 6.14689379 6.15410822
 6.16132265 6.16853707 6.1757515  6.18296593 6.19018036 6.19739479
 6.20460922 6.21182365 6.21903808 6.22625251 6.23346693 6.24068136
 6.24789579 6.25511022 6.26232465 6.26953908 6.27675351 6.28396794
 6.29118236 6.29839679 6.30561122 6.31282565 6.32004008 6.32725451
 6.33446894 6.34168337 6.3488978  6.35611222 6.36332665 6.37054108
 6.37775551 6.38496994 6.39218437 6.3993988  6.40661323 6.41382766
 6.42104208 6.42825651 6.43547094 6.44268537 6.4498998  6.45711423
 6.46432866 6.47154309 6.47875752 6.48597194 6.49318637 6.5004008
 6.50761523 6.51482966 6.52204409 6.52925852 6.53647295 6.54368737
 6.5509018  6.55811623 6.56533066 6.57254509 6.57975952 6.58697395
 6.59418838 6.60140281 6.60861723 6.61583166 6.62304609 6.63026052
 6.63747495 6.64468938 6.65190381 6.65911824 6.66633267 6.67354709
 6.68076152 6.68797595 6.69519038 6.70240481 6.70961924 6.71683367
 6.7240481  6.73126253 6.73847695 6.74569138 6.75290581 6.76012024
 6.76733467 6.7745491  6.78176353 6.78897796 6.79619238 6.80340681
 6.81062124 6.81783567 6.8250501  6.83226453 6.83947896 6.84669339
 6.85390782 6.86112224 6.86833667 6.8755511  6.88276553 6.88997996
 6.89719439 6.90440882 6.91162325 6.91883768 6.9260521  6.93326653
 6.94048096 6.94769539 6.95490982 6.96212425 6.96933868 6.97655311
 6.98376754 6.99098196 6.99819639 7.00541082 7.01262525 7.01983968
 7.02705411 7.03426854 7.04148297 7.04869739 7.05591182 7.06312625
 7.07034068 7.07755511 7.08476954 7.09198397 7.0991984  7.10641283
 7.11362725 7.12084168 7.12805611 7.13527054 7.14248497 7.1496994
 7.15691383 7.16412826 7.17134269 7.17855711 7.18577154 7.19298597
 7.2002004  7.20741483 7.21462926 7.22184369 7.22905812 7.23627255
 7.24348697 7.2507014  7.25791583 7.26513026 7.27234469 7.27955912
 7.28677355 7.29398798 7.3012024  7.30841683 7.31563126 7.32284569
 7.33006012 7.33727455 7.34448898 7.35170341 7.35891784 7.36613226
 7.37334669 7.38056112 7.38777555 7.39498998 7.40220441 7.40941884
 7.41663327 7.4238477  7.43106212 7.43827655 7.44549098 7.45270541
 7.45991984 7.46713427 7.4743487  7.48156313 7.48877756 7.49599198
 7.50320641 7.51042084 7.51763527 7.5248497  7.53206413 7.53927856
 7.54649299 7.55370741 7.56092184 7.56813627 7.5753507  7.58256513
 7.58977956 7.59699399 7.60420842 7.61142285 7.61863727 7.6258517
 7.63306613 7.64028056 7.64749499 7.65470942 7.66192385 7.66913828
 7.67635271 7.68356713 7.69078156 7.69799599 7.70521042 7.71242485
 7.71963928 7.72685371 7.73406814 7.74128257 7.74849699 7.75571142
 7.76292585 7.77014028 7.77735471 7.78456914 7.79178357 7.798998
 7.80621242 7.81342685 7.82064128 7.82785571 7.83507014 7.84228457
 7.849499   7.85671343 7.86392786 7.87114228 7.87835671 7.88557114
 7.89278557 7.9       ]
In [37]:
# help(np.meshgrid)
In [36]:
# 相当于把t1从1维重复到了2维,把一行数据变成了n行
x1,x2=np.meshgrid(t1,t2)
print(x1)
[[4.3        4.30721443 4.31442886 ... 7.88557114 7.89278557 7.9       ]
 [4.3        4.30721443 4.31442886 ... 7.88557114 7.89278557 7.9       ]
 [4.3        4.30721443 4.31442886 ... 7.88557114 7.89278557 7.9       ]
 ...
 [4.3        4.30721443 4.31442886 ... 7.88557114 7.89278557 7.9       ]
 [4.3        4.30721443 4.31442886 ... 7.88557114 7.89278557 7.9       ]
 [4.3        4.30721443 4.31442886 ... 7.88557114 7.89278557 7.9       ]]
In [42]:
# help(np.stack)
# Join a sequence of arrays along a new axis.
'''
    >>> a = np.array([1, 2, 3])
    >>> b = np.array([2, 3, 4])
    >>> np.stack((a, b))
    array([[1, 2, 3],
           [2, 3, 4]])
'''
Out[42]:
'\n    >>> a = np.array([1, 2, 3])\n    >>> b = np.array([2, 3, 4])\n    >>> np.stack((a, b))\n    array([[1, 2, 3],\n           [2, 3, 4]])\n'
In [43]:
x_test=np.stack((x1.flat,x2.flat),axis=1)
x_test
Out[43]:
array([[4.3       , 2.        ],
       [4.30721443, 2.        ],
       [4.31442886, 2.        ],
       ...,
       [7.88557114, 4.4       ],
       [7.89278557, 4.4       ],
       [7.9       , 4.4       ]])
In [85]:
cm_light=mpl.colors.ListedColormap(["#77E0A0","#FF8080","#A0A0FF"])
cm_dark=mpl.colors.ListedColormap(['b','g','r'])

五、验证模型 model.predict(x_test)

In [87]:
print(y)
0      0
1      0
2      0
3      0
4      0
      ..
145    2
146    2
147    2
148    2
149    2
Name: 4, Length: 150, dtype: int64
In [3]:
# help(plt.pcolormesh)
# Create a pseudocolor plot with a non-regular rectangular grid.
In [86]:
y_hat=lr.predict(x_test) # 预测值
y_hat=y_hat.reshape(x1.shape) #使预测值 与 输入的形状相同
plt.pcolormesh(x1,x2,y_hat,cmap=cm_light) # 预测值得显示
plt.scatter(x.iloc[:,0],x.iloc[:,1],c=y,edgecolors="k",s=50,cmap = cm_dark) # 样本的显示

#样本展示
plt.xlabel("petal length")
plt.ylabel("petal width")
plt.xlim(x1_min,x1_max)
plt.ylim(x2_min,x2_max)
plt.grid()
# 保存图片
plt.savefig("3.png")
plt.show()
In [63]:
# help(np.reshape)
# Gives a new shape to an array without changing its data.
In [62]:
a = np.arange(6).reshape((3,2))
print(a)
a = a.reshape(-1)
print(a)
[[0 1]
 [2 3]
 [4 5]]
[0 1 2 3 4 5]
In [67]:
print(y)
print(type(y))
0      0
1      0
2      0
3      0
4      0
      ..
145    2
146    2
147    2
148    2
149    2
Name: 4, Length: 150, dtype: int64
<class 'pandas.core.series.Series'>
In [77]:
#训练集上的预测结果
y_hat=lr.predict(x)
# 变成一列
# y=y.reshape(-1)
result = y_hat ==y 
print("y_hat:\n",y_hat) 
print("result:\n",result)
acc=np.mean(result)
print("准确率:%.2f%%"%(100*acc))
y_hat:
 [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
 0 0 0 0 1 0 0 0 0 0 0 0 0 2 2 2 1 2 1 2 1 2 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1
 2 2 2 2 1 1 1 1 1 1 1 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 2 1 2 2 2 2 1 2 2 2 2
 2 2 1 1 2 2 2 2 1 2 1 2 1 2 2 1 1 2 2 2 2 2 2 1 2 2 2 1 2 2 2 1 2 2 2 1 2
 2 1]
result:
 0       True
1       True
2       True
3       True
4       True
       ...  
145     True
146    False
147     True
148     True
149    False
Name: 4, Length: 150, dtype: bool
准确率:81.33%
In [ ]:
 

 

 

 
posted @ 2020-07-28 06:32  范仁义  阅读(304)  评论(0编辑  收藏  举报