Python数据挖掘—回归—神经网络
概念:
神经网络:全称为人工神经网络,是一种模仿生物神经网络(动物的中枢神经系统,特别是大脑)的结构和功能的数学模型或计算模型
生物神经网络:神经细胞是构成神经系统的基本单元,称为生物神经元,简称神经元
一般采用三到五层
首先导入自变量和因变量
1 import pandas; 2 from pandas import read_csv; 3 4 data = read_csv( 5 "C:\\Users\\Jw\\Desktop\\python_work\\Python数据挖掘实战课程课件\\4.5\\data.csv", 6 encoding='utf8' 7 ) 8 data = data.dropna() 9 10 dummyColumns = [ 11 'Gender', 'Home Ownership', 'Internet Connection', 'Marital Status', 12 'Movie Selector', 'Prerec Format', 'TV Signal'] 13 14 for column in dummyColumns: 15 data[column]=data[column].astype('category') 16 17 dummiesData = pandas.get_dummies( 18 data, 19 columns=dummyColumns, 20 prefix=dummyColumns, 21 prefix_sep=" ", 22 drop_first=True 23 ) 24 25 """ 26 博士后 Post-Doc 27 博士 Doctorate 28 硕士 Master's Degree 29 学士 Bachelor's Degree 30 副学士 Associate's Degree 31 专业院校 Some College 32 职业学校 Trade School 33 高中 High School 34 小学 Grade School 35 """ 36 educationLevelDict = { 37 'Post-Doc': 9, 38 'Doctorate': 8, 39 'Master\'s Degree': 7, 40 'Bachelor\'s Degree': 6, 41 'Associate\'s Degree': 5, 42 'Some College': 4, 43 'Trade School': 3, 44 'High School': 2, 45 'Grade School': 1 46 } 47 48 dummiesData['Education Level Map'] = dummiesData['Education Level'].map(educationLevelDict) 49 50 freqMap = { 51 'Never': 0, 52 'Rarely': 1, 53 'Monthly': 2, 54 'Weekly': 3, 55 'Daily': 4 56 } 57 dummiesData['PPV Freq Map'] = dummiesData['PPV Freq'].map(freqMap) 58 dummiesData['Theater Freq Map'] = dummiesData['Theater Freq'].map(freqMap) 59 dummiesData['TV Movie Freq Map'] = dummiesData['TV Movie Freq'].map(freqMap) 60 dummiesData['Prerec Buying Freq Map'] = dummiesData['Prerec Buying Freq'].map(freqMap) 61 dummiesData['Prerec Renting Freq Map'] = dummiesData['Prerec Renting Freq'].map(freqMap) 62 dummiesData['Prerec Viewing Freq Map'] = dummiesData['Prerec Viewing Freq'].map(freqMap) 63 64 dummiesSelect = [ 65 'Age', 'Num Bathrooms', 'Num Bedrooms', 'Num Cars', 'Num Children', 'Num TVs', 66 'Education Level Map', 'PPV Freq Map', 'Theater Freq Map', 'TV Movie Freq Map', 67 'Prerec Buying Freq Map', 'Prerec Renting Freq Map', 'Prerec Viewing Freq Map', 68 'Gender Male', 69 'Internet Connection DSL', 'Internet Connection Dial-Up', 70 'Internet Connection IDSN', 'Internet Connection No Internet Connection', 71 'Internet Connection Other', 72 'Marital Status Married', 'Marital Status Never Married', 73 'Marital Status Other', 'Marital Status Separated', 74 'Movie Selector Me', 'Movie Selector Other', 'Movie Selector Spouse/Partner', 75 'Prerec Format DVD', 'Prerec Format Laserdisk', 'Prerec Format Other', 76 'Prerec Format VHS', 'Prerec Format Video CD', 77 'TV Signal Analog antennae', 'TV Signal Cable', 78 'TV Signal Digital Satellite', 'TV Signal Don\'t watch TV' 79 ] 80 81 inputData = dummiesData[dummiesSelect] 82 83 outputData = dummiesData[['Home Ownership Rent']]
导入神经网络中的MLPClassifier类,使用模型进行多次评分
activation="relu",为激活函数,默认为relu,该句类似于使用s函数,hidden_layer_sizes时隐藏的层数
activation 激活函数
√ relu 线性纠正函数,优于logistics和tanh,因为更符合生物神经元(要么不活动,活动起来比较平缓)
√logistic logistic函数
√tanh tanh函数
1 from sklearn.neural_network import MLPClassifier 2 3 for l in range(1, 11): 4 ANNModel = MLPClassifier( 5 activation='relu', #类似于s函数 6 hidden_layer_sizes=l #隐藏层层数 7 ) 8 9 ANNModel.fit(inputData, outputData) 10 11 score = ANNModel.score(inputData, outputData) 12 print(str(l) + ", " + str(score))
预测数据
1 newData = read_csv( 2 "C:\\Users\\Jw\\Desktop\\python_work\\Python数据挖掘实战课程课件\\4.4\\newData.csv", 3 encoding='utf-8' 4 ) 5 6 for column in dummyColumns: 7 newData[column] = newData[column].astype( 8 'category', 9 categories=data[column].cat.categories 10 ) 11 12 newData = newData.dropna() 13 14 newData['Education Level Map'] = newData['Education Level'].map(educationLevelDict) 15 newData['PPV Freq Map'] = newData['PPV Freq'].map(freqMap) 16 newData['Theater Freq Map'] = newData['Theater Freq'].map(freqMap) 17 newData['TV Movie Freq Map'] = newData['TV Movie Freq'].map(freqMap) 18 newData['Prerec Buying Freq Map'] = newData['Prerec Buying Freq'].map(freqMap) 19 newData['Prerec Renting Freq Map'] = newData['Prerec Renting Freq'].map(freqMap) 20 newData['Prerec Viewing Freq Map'] = newData['Prerec Viewing Freq'].map(freqMap) 21 22 dummiesNewData = pandas.get_dummies( 23 newData, 24 columns=dummyColumns, 25 prefix=dummyColumns, 26 prefix_sep=" ", 27 drop_first=True 28 ) 29 30 inputNewData = dummiesNewData[dummiesSelect] 31 32 ANNModel.predict(inputData)