Python数据挖掘—回归—神经网络

概念:

神经网络:全称为人工神经网络,是一种模仿生物神经网络(动物的中枢神经系统,特别是大脑)的结构和功能的数学模型或计算模型

生物神经网络:神经细胞是构成神经系统的基本单元,称为生物神经元,简称神经元

一般采用三到五层

 

首先导入自变量和因变量

 1 import pandas;
 2 from pandas import read_csv;
 3 
 4 data = read_csv(
 5     "C:\\Users\\Jw\\Desktop\\python_work\\Python数据挖掘实战课程课件\\4.5\\data.csv", 
 6     encoding='utf8'
 7 )
 8 data = data.dropna()
 9 
10 dummyColumns = [
11     'Gender', 'Home Ownership', 'Internet Connection', 'Marital Status',
12     'Movie Selector', 'Prerec Format', 'TV Signal']
13 
14 for column in dummyColumns:
15     data[column]=data[column].astype('category')
16 
17 dummiesData = pandas.get_dummies(
18     data, 
19     columns=dummyColumns,
20     prefix=dummyColumns,
21     prefix_sep=" ",
22     drop_first=True
23 )
24 
25 """
26 博士后    Post-Doc
27 博士      Doctorate
28 硕士      Master's Degree
29 学士      Bachelor's Degree
30 副学士    Associate's Degree
31 专业院校  Some College
32 职业学校  Trade School
33 高中      High School
34 小学      Grade School
35 """
36 educationLevelDict = {
37     'Post-Doc': 9,
38     'Doctorate': 8,
39     'Master\'s Degree': 7,
40     'Bachelor\'s Degree': 6,
41     'Associate\'s Degree': 5,
42     'Some College': 4,
43     'Trade School': 3,
44     'High School': 2,
45     'Grade School': 1
46 }
47 
48 dummiesData['Education Level Map'] = dummiesData['Education Level'].map(educationLevelDict)
49 
50 freqMap = {
51     'Never': 0,
52     'Rarely': 1,
53     'Monthly': 2,
54     'Weekly': 3,
55     'Daily': 4
56 }
57 dummiesData['PPV Freq Map'] = dummiesData['PPV Freq'].map(freqMap)
58 dummiesData['Theater Freq Map'] = dummiesData['Theater Freq'].map(freqMap)
59 dummiesData['TV Movie Freq Map'] = dummiesData['TV Movie Freq'].map(freqMap)
60 dummiesData['Prerec Buying Freq Map'] = dummiesData['Prerec Buying Freq'].map(freqMap)
61 dummiesData['Prerec Renting Freq Map'] = dummiesData['Prerec Renting Freq'].map(freqMap)
62 dummiesData['Prerec Viewing Freq Map'] = dummiesData['Prerec Viewing Freq'].map(freqMap)
63 
64 dummiesSelect = [
65     'Age', 'Num Bathrooms', 'Num Bedrooms', 'Num Cars', 'Num Children', 'Num TVs', 
66     'Education Level Map', 'PPV Freq Map', 'Theater Freq Map', 'TV Movie Freq Map', 
67     'Prerec Buying Freq Map', 'Prerec Renting Freq Map', 'Prerec Viewing Freq Map', 
68     'Gender Male',
69     'Internet Connection DSL', 'Internet Connection Dial-Up', 
70     'Internet Connection IDSN', 'Internet Connection No Internet Connection',
71     'Internet Connection Other', 
72     'Marital Status Married', 'Marital Status Never Married', 
73     'Marital Status Other', 'Marital Status Separated', 
74     'Movie Selector Me', 'Movie Selector Other', 'Movie Selector Spouse/Partner', 
75     'Prerec Format DVD', 'Prerec Format Laserdisk', 'Prerec Format Other', 
76     'Prerec Format VHS', 'Prerec Format Video CD', 
77     'TV Signal Analog antennae', 'TV Signal Cable', 
78     'TV Signal Digital Satellite', 'TV Signal Don\'t watch TV'
79 ]
80 
81 inputData = dummiesData[dummiesSelect]
82 
83 outputData = dummiesData[['Home Ownership Rent']]
View Code

 

导入神经网络中的MLPClassifier类,使用模型进行多次评分

activation="relu",为激活函数,默认为relu,该句类似于使用s函数,hidden_layer_sizes时隐藏的层数

 

activation 激活函数

  √ relu    线性纠正函数,优于logistics和tanh,因为更符合生物神经元(要么不活动,活动起来比较平缓)

  √logistic   logistic函数

  √tanh       tanh函数

 1 from sklearn.neural_network import MLPClassifier
 2 
 3 for l in range(1, 11):
 4     ANNModel = MLPClassifier(
 5         activation='relu',   #类似于s函数
 6         hidden_layer_sizes=l   #隐藏层层数
 7     )
 8 
 9     ANNModel.fit(inputData, outputData)
10 
11     score = ANNModel.score(inputData, outputData)
12     print(str(l) + ", " + str(score))

预测数据

 1 newData = read_csv(
 2     "C:\\Users\\Jw\\Desktop\\python_work\\Python数据挖掘实战课程课件\\4.4\\newData.csv", 
 3     encoding='utf-8'
 4 )
 5 
 6 for column in dummyColumns:
 7     newData[column] = newData[column].astype(
 8         'category', 
 9         categories=data[column].cat.categories
10     )
11 
12 newData = newData.dropna()
13 
14 newData['Education Level Map'] = newData['Education Level'].map(educationLevelDict)
15 newData['PPV Freq Map'] = newData['PPV Freq'].map(freqMap)
16 newData['Theater Freq Map'] = newData['Theater Freq'].map(freqMap)
17 newData['TV Movie Freq Map'] = newData['TV Movie Freq'].map(freqMap)
18 newData['Prerec Buying Freq Map'] = newData['Prerec Buying Freq'].map(freqMap)
19 newData['Prerec Renting Freq Map'] = newData['Prerec Renting Freq'].map(freqMap)
20 newData['Prerec Viewing Freq Map'] = newData['Prerec Viewing Freq'].map(freqMap)
21 
22 dummiesNewData = pandas.get_dummies(
23     newData, 
24     columns=dummyColumns,
25     prefix=dummyColumns,
26     prefix_sep=" ",
27     drop_first=True
28 )
29 
30 inputNewData = dummiesNewData[dummiesSelect]
31 
32 ANNModel.predict(inputData)

 

posted @ 2018-10-04 20:03  我不要被你记住  阅读(2357)  评论(0编辑  收藏  举报