week_15_Random_Forest


霉霉数据分析——基于pandas库


背景

image

image

image

image

image

image

image

image

image

摘录《Taylor Swift The Whole Story》

image

image

image

image

目的

我们尝试通过taylor swift的数据集分析人气单曲、具有舞蹈性和活力的单曲、各个特征的相关程度、专辑流行度。

变量描述

popularity:流行度

Danceability: 舞蹈性

Acousticness: 原声,值越高,歌曲的原声性越强对应为不插电

Energy: 歌曲的能量

instrumentalness:乐器表现

liveness:现场感

Loudness: 响度,值越高,歌曲越响亮

Speechiness: 口语化,值越高,歌词越口语化

Valence: 效价(情绪),值越高,这首歌的情绪就越积极正能量

tempo:节奏感

读入数据

import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
tay = pd.read_csv('spotify_taylorswift.csv')
tay.head()
Unnamed: 0 name album artist release_date length popularity danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
0 0 Tim McGraw Taylor Swift Taylor Swift 2006-10-24 232106 49 0.580 0.575 0.491 0.0 0.1210 -6.462 0.0251 0.425 76.009
1 1 Picture To Burn Taylor Swift Taylor Swift 2006-10-24 173066 54 0.658 0.173 0.877 0.0 0.0962 -2.098 0.0323 0.821 105.586
2 2 Teardrops On My Guitar - Radio Single Remix Taylor Swift Taylor Swift 2006-10-24 203040 59 0.621 0.288 0.417 0.0 0.1190 -6.941 0.0231 0.289 99.953
3 3 A Place in this World Taylor Swift Taylor Swift 2006-10-24 199200 49 0.576 0.051 0.777 0.0 0.3200 -2.881 0.0324 0.428 115.028
4 4 Cold As You Taylor Swift Taylor Swift 2006-10-24 239013 50 0.418 0.217 0.482 0.0 0.1230 -5.769 0.0266 0.261 175.558

数据预处理

删除第一列索引


del tay[tay.keys()[0]]
tay.head()
name album artist release_date length popularity danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
0 Tim McGraw Taylor Swift Taylor Swift 2006-10-24 232106 49 0.580 0.575 0.491 0.0 0.1210 -6.462 0.0251 0.425 76.009
1 Picture To Burn Taylor Swift Taylor Swift 2006-10-24 173066 54 0.658 0.173 0.877 0.0 0.0962 -2.098 0.0323 0.821 105.586
2 Teardrops On My Guitar - Radio Single Remix Taylor Swift Taylor Swift 2006-10-24 203040 59 0.621 0.288 0.417 0.0 0.1190 -6.941 0.0231 0.289 99.953
3 A Place in this World Taylor Swift Taylor Swift 2006-10-24 199200 49 0.576 0.051 0.777 0.0 0.3200 -2.881 0.0324 0.428 115.028
4 Cold As You Taylor Swift Taylor Swift 2006-10-24 239013 50 0.418 0.217 0.482 0.0 0.1230 -5.769 0.0266 0.261 175.558

删除空值的样本

tay.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 171 entries, 0 to 170
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              171 non-null    object 
 1   album             171 non-null    object 
 2   artist            171 non-null    object 
 3   release_date      171 non-null    object 
 4   length            171 non-null    int64  
 5   popularity        171 non-null    int64  
 6   danceability      171 non-null    float64
 7   acousticness      171 non-null    float64
 8   energy            171 non-null    float64
 9   instrumentalness  171 non-null    float64
 10  liveness          171 non-null    float64
 11  loudness          171 non-null    float64
 12  speechiness       171 non-null    float64
 13  valence           171 non-null    float64
 14  tempo             171 non-null    float64
dtypes: float64(9), int64(2), object(4)
memory usage: 20.2+ KB
tay = tay.dropna()
tay.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 171 entries, 0 to 170
Data columns (total 15 columns):
 #   Column            Non-Null Count  Dtype  
---  ------            --------------  -----  
 0   name              171 non-null    object 
 1   album             171 non-null    object 
 2   artist            171 non-null    object 
 3   release_date      171 non-null    object 
 4   length            171 non-null    int64  
 5   popularity        171 non-null    int64  
 6   danceability      171 non-null    float64
 7   acousticness      171 non-null    float64
 8   energy            171 non-null    float64
 9   instrumentalness  171 non-null    float64
 10  liveness          171 non-null    float64
 11  loudness          171 non-null    float64
 12  speechiness       171 non-null    float64
 13  valence           171 non-null    float64
 14  tempo             171 non-null    float64
dtypes: float64(9), int64(2), object(4)
memory usage: 20.2+ KB

人气排序

#用sort_values对人气进行排序,以获得泰勒最受欢迎的歌曲。
populars = tay.sort_values(by='popularity', ascending=False)

populars.head(13)
name album artist release_date length popularity danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
60 Blank Space 1989 (Deluxe) Taylor Swift 2014-01-01 231826 82 0.760 0.10300 0.703 0.000000 0.0913 -5.412 0.0540 0.5700 95.997
64 Shake It Off 1989 (Deluxe) Taylor Swift 2014-01-01 219200 80 0.647 0.06470 0.800 0.000000 0.3340 -5.384 0.1650 0.9420 160.078
95 Lover Lover Taylor Swift 2019-08-23 221306 80 0.359 0.49200 0.543 0.000016 0.1180 -7.582 0.0919 0.4530 68.534
82 Delicate reputation Taylor Swift 2017-11-10 232253 78 0.750 0.21600 0.404 0.000357 0.0911 -10.178 0.0682 0.0499 95.045
106 You Need To Calm Down Lover Taylor Swift 2019-08-23 171360 78 0.771 0.00929 0.671 0.000000 0.0637 -5.617 0.0553 0.7140 85.026
94 Cruel Summer Lover Taylor Swift 2019-08-23 178426 77 0.552 0.11700 0.702 0.000021 0.1050 -5.707 0.1570 0.5640 169.994
108 ME! (feat. Brendon Urie of Panic! At The Disco) Lover Taylor Swift 2019-08-23 193000 77 0.610 0.03300 0.830 0.000000 0.1180 -4.105 0.0571 0.7280 182.162
83 Look What You Made Me Do reputation Taylor Swift 2017-11-10 211853 77 0.766 0.20400 0.709 0.000014 0.1260 -6.471 0.1230 0.5060 128.070
86 Getaway Car reputation Taylor Swift 2017-11-10 233626 76 0.562 0.00465 0.689 0.000002 0.0888 -6.745 0.1270 0.3510 172.054
150 You Belong With Me (Taylor’s Version) Fearless (Taylor's Version) Taylor Swift 2021-04-09 231124 76 0.632 0.06230 0.773 0.000000 0.0885 -4.856 0.0346 0.4740 130.033
100 Paper Rings Lover Taylor Swift 2019-08-23 222400 76 0.811 0.01290 0.719 0.000014 0.0742 -6.553 0.0497 0.8650 103.979
81 Don’t Blame Me reputation Taylor Swift 2017-11-10 236413 75 0.615 0.10600 0.534 0.000018 0.0607 -6.719 0.0386 0.1930 135.917
61 Style 1989 (Deluxe) Taylor Swift 2014-01-01 231000 75 0.588 0.00245 0.791 0.002580 0.1180 -5.595 0.0402 0.4870 94.933

提取 TOP 5 单曲

top_5 = populars.head()

#loc是用来访问一组行和列
top_5.loc[:,['name','popularity']]
name popularity
60 Blank Space 82
64 Shake It Off 80
95 Lover 80
82 Delicate 78
106 You Need To Calm Down 78

提取舞蹈性较高的单曲

#有一栏danceability - 舞蹈性,我们要过滤掉那些舞蹈性最高的歌曲(即,超过0.75)。
dance_on = tay.loc[(tay.danceability >= 0.75)]
dance_on.loc[:,['name','album','danceability']].head(13)
name album danceability
56 Treacherous - Original Demo Recording Red (Deluxe Edition) 0.828
59 Welcome To New York 1989 (Deluxe) 0.789
60 Blank Space 1989 (Deluxe) 0.760
68 How You Get The Girl 1989 (Deluxe) 0.765
71 Clean 1989 (Deluxe) 0.815
76 I Wish You Would - Voice Memo 1989 (Deluxe) 0.781
82 Delicate reputation 0.750
83 Look What You Made Me Do reputation 0.766
85 Gorgeous reputation 0.800
96 The Man Lover 0.777
98 I Think He Knows Lover 0.897
100 Paper Rings Lover 0.811
101 Cornelia Street Lover 0.824

提取活力值较高的单曲


energy = tay.loc[(tay.energy >= 0.5)]

energy = tay.sort_values(by='energy', ascending=False)

energy.loc[:,['name','album','energy']].head(13)
name album energy
26 Haunted Speak Now (Deluxe Package) 0.944
11 I'm Only Me When I'm With You Taylor Swift 0.934
24 Better Than Revenge Speak Now (Deluxe Package) 0.917
152 Tell Me Why (Taylor’s Version) Fearless (Taylor's Version) 0.909
57 Red - Original Demo Recording Red (Deluxe Edition) 0.902
38 Red Red (Deluxe Edition) 0.896
65 I Wish You Would 1989 (Deluxe) 0.893
74 New Romantics 1989 (Deluxe) 0.889
1 Picture To Burn Taylor Swift 0.877
163 The Other Side Of The Door (Taylor’s Version) Fearless (Taylor's Version) 0.873
21 The Story Of Us Speak Now (Deluxe Package) 0.855
62 Out Of The Woods 1989 (Deluxe) 0.841
108 ME! (feat. Brendon Urie of Panic! At The Disco) Lover 0.830

查询你最喜欢的专辑

比如,我喜欢《Speak Now》 😃

tay.loc[tay.album == 'Speak Now (Deluxe Package)']
name album artist release_date length popularity danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
15 Mine - POP Mix Speak Now (Deluxe Package) Taylor Swift 2010-01-01 230546 45 0.696 0.004610 0.768 0.000001 0.1010 -3.863 0.0308 0.692 121.050
16 Sparks Fly Speak Now (Deluxe Package) Taylor Swift 2010-01-01 260946 50 0.608 0.038700 0.785 0.000000 0.1580 -2.976 0.0311 0.376 114.985
17 Back To December Speak Now (Deluxe Package) Taylor Swift 2010-01-01 293040 50 0.517 0.020200 0.606 0.000000 0.3240 -5.797 0.0289 0.296 141.929
18 Speak Now Speak Now (Deluxe Package) Taylor Swift 2010-01-01 240773 49 0.708 0.101000 0.601 0.000000 0.0979 -3.750 0.0306 0.742 118.962
19 Dear John Speak Now (Deluxe Package) Taylor Swift 2010-01-01 403887 48 0.583 0.183000 0.468 0.000002 0.1110 -5.378 0.0278 0.126 119.375
20 Mean Speak Now (Deluxe Package) Taylor Swift 2010-01-01 237746 48 0.570 0.445000 0.747 0.000000 0.2190 -3.978 0.0426 0.808 164.004
21 The Story Of Us Speak Now (Deluxe Package) Taylor Swift 2010-01-01 265636 48 0.575 0.000315 0.855 0.001610 0.0419 -4.827 0.0467 0.840 139.920
22 Never Grow Up Speak Now (Deluxe Package) Taylor Swift 2010-01-01 290480 44 0.715 0.829000 0.308 0.000000 0.1600 -8.829 0.0305 0.547 124.899
23 Enchanted Speak Now (Deluxe Package) Taylor Swift 2010-01-01 352200 51 0.535 0.071600 0.618 0.000388 0.1690 -3.913 0.0273 0.228 81.975
24 Better Than Revenge Speak Now (Deluxe Package) Taylor Swift 2010-01-01 217173 49 0.519 0.016700 0.917 0.000021 0.3590 -3.185 0.0887 0.652 145.882
25 Innocent Speak Now (Deluxe Package) Taylor Swift 2010-01-01 302266 44 0.553 0.202000 0.604 0.000000 0.1250 -5.295 0.0258 0.186 133.989
26 Haunted Speak Now (Deluxe Package) Taylor Swift 2010-01-01 242093 47 0.434 0.076900 0.944 0.000000 0.1510 -2.641 0.0581 0.361 162.020
27 Last Kiss Speak Now (Deluxe Package) Taylor Swift 2010-01-01 367146 47 0.359 0.581000 0.329 0.000037 0.0979 -9.531 0.0293 0.208 84.358
28 Long Live Speak Now (Deluxe Package) Taylor Swift 2010-01-01 317960 47 0.412 0.042600 0.682 0.000075 0.1060 -4.319 0.0339 0.146 203.959
29 Ours Speak Now (Deluxe Package) Taylor Swift 2010-01-01 238173 55 0.610 0.505000 0.556 0.000000 0.0851 -7.369 0.0285 0.192 159.838
30 If This Was A Movie Speak Now (Deluxe Package) Taylor Swift 2010-01-01 234546 52 0.515 0.154000 0.724 0.000004 0.2690 -3.498 0.0267 0.257 147.788
31 Superman Speak Now (Deluxe Package) Taylor Swift 2010-01-01 276266 47 0.582 0.018700 0.817 0.000002 0.1010 -3.718 0.0337 0.547 131.983
32 Back To December - Acoustic Speak Now (Deluxe Package) Taylor Swift 2010-01-01 292533 59 0.541 0.731000 0.451 0.000000 0.1970 -6.522 0.0270 0.333 141.713
33 Haunted - Acoustic Version Speak Now (Deluxe Package) Taylor Swift 2010-01-01 217626 48 0.574 0.841000 0.462 0.000000 0.2800 -5.124 0.0252 0.314 80.858
34 Mine Speak Now (Deluxe Package) Taylor Swift 2010-01-01 230773 64 0.621 0.003270 0.780 0.000005 0.1840 -2.934 0.0297 0.672 121.038
35 Back To December Speak Now (Deluxe Package) Taylor Swift 2010-01-01 293040 43 0.525 0.113000 0.676 0.000000 0.2940 -4.684 0.0294 0.281 141.950
36 The Story Of Us Speak Now (Deluxe Package) Taylor Swift 2010-01-01 266480 59 0.546 0.004870 0.809 0.000372 0.0437 -3.621 0.0410 0.649 139.910

分析

查看专辑流行度


#"Taylor Swift","Fearless","Speak Now","Red","1989","Reputation","Lover","Folklore","Evermore"
#59,76,64,72,82,78,80,65,72
#'lightblue','gold','purple','red','tan','black','pink','grey','brown'

album = ["Taylor Swift","Fearless","Speak Now","Red","1989","Reputation","Lover","Folklore","Evermore"]
maxpopularity = [59,76,64,72,82,78,80,65,72]
newcolors = ['lightblue','gold','purple','red','tan','black','pink','grey','brown']

plt.figure( figsize = (8,6))
plt.bar(album, maxpopularity,color = newcolors)
plt.title(' Popularity of albums')
plt.xlabel('ALBUM')
plt.ylabel('POPULARITY')
plt.xticks(fontsize=6)
plt.show()

image

流行度直方图

sns.displot(x='popularity', data=tay, kde=True, color='#a70ad5')
plt.title('Popularity Distribution')
Text(0.5, 1.0, 'Popularity Distribution')

image

以活力值和舞蹈性为例

import matplotlib.pyplot as plt
import numpy as np
energyy = tay.energy
dancee = tay.danceability

correlation_length = energyy.corr(dancee)
print(correlation_length)
ax1 = tay.plot.scatter(x = 'energy',y = 'danceability',c = 'red')
0.06266927386034048

image

Energy 和 Acousticness的相关性

sns.set_theme(color_codes=True)
plt.figure(figsize=(18, 6))
sns.regplot(x='acousticness', y='energy', data=tay)
plt.title('Acousticness vs Energy')
Text(0.5, 1.0, 'Acousticness vs Energy')

image

Energy 和 Loudness相关性

sns.set_theme(color_codes=True)
plt.figure(figsize=(18, 6))
sns.regplot(x='energy', y='loudness', data=tay)
plt.title('Energy vs Loudness')
Text(0.5, 1.0, 'Energy vs Loudness')

image


相关系数矩阵

tay.corr()
C:\Users\dogfa\AppData\Local\Temp\ipykernel_10964\3121711199.py:1: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
  tay.corr()
length popularity danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
length 1.000000 0.011772 -0.301561 0.038749 -0.114792 -0.081288 -0.148412 0.044126 -0.414447 -0.420405 0.010425
popularity 0.011772 1.000000 0.072622 -0.117842 0.127495 0.035638 -0.406730 0.122576 -0.478262 0.034154 -0.015669
danceability -0.301561 0.072622 1.000000 -0.143085 0.062669 -0.051837 -0.015766 0.002587 0.183860 0.379786 -0.235370
acousticness 0.038749 -0.117842 -0.143085 1.000000 -0.710055 0.140655 -0.065387 -0.736624 0.143127 -0.231232 -0.134467
energy -0.114792 0.127495 0.062669 -0.710055 1.000000 0.000281 0.046364 0.784973 -0.179336 0.490371 0.209914
instrumentalness -0.081288 0.035638 -0.051837 0.140655 0.000281 1.000000 -0.059132 -0.084224 -0.029729 0.020076 0.043274
liveness -0.148412 -0.406730 -0.015766 -0.065387 0.046364 -0.059132 1.000000 0.016324 0.357924 -0.017264 0.034934
loudness 0.044126 0.122576 0.002587 -0.736624 0.784973 -0.084224 0.016324 1.000000 -0.409577 0.299926 0.171503
speechiness -0.414447 -0.478262 0.183860 0.143127 -0.179336 -0.029729 0.357924 -0.409577 1.000000 0.120352 -0.027812
valence -0.420405 0.034154 0.379786 -0.231232 0.490371 0.020076 -0.017264 0.299926 0.120352 1.000000 -0.006056
tempo 0.010425 -0.015669 -0.235370 -0.134467 0.209914 0.043274 0.034934 0.171503 -0.027812 -0.006056 1.000000

相关系数矩阵图

import seaborn as sns
import matplotlib.pyplot as plt
sns.heatmap(tay.corr())
C:\Users\dogfa\AppData\Local\Temp\ipykernel_10964\54934055.py:1: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
  sns.heatmap(tay.corr())





<Axes: >

image

相关性矩阵图

sns.pairplot(tay[2:])
<seaborn.axisgrid.PairGrid at 0x29d8b437b50>

image

Random Forest Regression

算法原理

随机森林中的每棵树在称为自助聚集 (bagging) 的过程中随机对训练数据子集进行抽样。该模型适合这些较小的数据集,并汇总预测结果。通过有放回抽样,可以重复使用同一数据的几个实例,结果就是,这些树不仅基于不同的数据集进行训练,而且还使用不同的特性做出决策。

image

整理数据

# 删除album列
dataset = tay.drop(['album','artist','release_date','name'], axis=1)

dataset
length popularity danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
0 232106 49 0.580 0.575 0.491 0.000000 0.1210 -6.462 0.0251 0.425 76.009
1 173066 54 0.658 0.173 0.877 0.000000 0.0962 -2.098 0.0323 0.821 105.586
2 203040 59 0.621 0.288 0.417 0.000000 0.1190 -6.941 0.0231 0.289 99.953
3 199200 49 0.576 0.051 0.777 0.000000 0.3200 -2.881 0.0324 0.428 115.028
4 239013 50 0.418 0.217 0.482 0.000000 0.1230 -5.769 0.0266 0.261 175.558
... ... ... ... ... ... ... ... ... ... ... ...
166 277591 74 0.660 0.162 0.817 0.000000 0.0667 -6.269 0.0521 0.714 135.942
167 244236 65 0.609 0.849 0.373 0.000000 0.0779 -8.819 0.0263 0.130 106.007
168 189495 67 0.588 0.225 0.608 0.000000 0.0920 -7.062 0.0365 0.508 90.201
169 208608 66 0.563 0.514 0.473 0.000012 0.1090 -11.548 0.0503 0.405 101.934
170 242157 64 0.624 0.334 0.624 0.000000 0.0995 -7.860 0.0539 0.527 80.132

171 rows × 11 columns

x = dataset.drop(['popularity'],axis=1)
x
length danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
0 232106 0.580 0.575 0.491 0.000000 0.1210 -6.462 0.0251 0.425 76.009
1 173066 0.658 0.173 0.877 0.000000 0.0962 -2.098 0.0323 0.821 105.586
2 203040 0.621 0.288 0.417 0.000000 0.1190 -6.941 0.0231 0.289 99.953
3 199200 0.576 0.051 0.777 0.000000 0.3200 -2.881 0.0324 0.428 115.028
4 239013 0.418 0.217 0.482 0.000000 0.1230 -5.769 0.0266 0.261 175.558
... ... ... ... ... ... ... ... ... ... ...
166 277591 0.660 0.162 0.817 0.000000 0.0667 -6.269 0.0521 0.714 135.942
167 244236 0.609 0.849 0.373 0.000000 0.0779 -8.819 0.0263 0.130 106.007
168 189495 0.588 0.225 0.608 0.000000 0.0920 -7.062 0.0365 0.508 90.201
169 208608 0.563 0.514 0.473 0.000012 0.1090 -11.548 0.0503 0.405 101.934
170 242157 0.624 0.334 0.624 0.000000 0.0995 -7.860 0.0539 0.527 80.132

171 rows × 10 columns

y = dataset['popularity']
y
0      49
1      54
2      59
3      49
4      50
       ..
166    74
167    65
168    67
169    66
170    64
Name: popularity, Length: 171, dtype: int64

划分train test

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2,random_state=1)
x_train
length danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
89 230373 0.719 0.032900 0.469 0.000000 0.1690 -8.792 0.0533 0.0851 120.085
88 211506 0.624 0.060400 0.691 0.000011 0.1380 -6.686 0.1960 0.2840 160.024
165 220839 0.599 0.816000 0.494 0.000000 0.1010 -7.610 0.0372 0.4400 142.893
110 293453 0.557 0.808000 0.496 0.000173 0.0772 -9.602 0.0563 0.2650 149.983
48 284866 0.624 0.632000 0.340 0.033700 0.0805 -12.411 0.0290 0.2610 129.987
... ... ... ... ... ... ... ... ... ... ...
133 215626 0.546 0.418000 0.613 0.000000 0.1030 -7.589 0.0264 0.5350 79.015
137 260440 0.515 0.855000 0.545 0.000020 0.0921 -9.277 0.0353 0.5350 88.856
72 245560 0.422 0.049300 0.692 0.000026 0.1770 -5.447 0.0549 0.1970 184.014
140 257773 0.535 0.876000 0.561 0.000136 0.1150 -11.609 0.0484 0.2870 96.103
37 295720 0.588 0.000197 0.825 0.001380 0.0884 -5.882 0.0328 0.3970 129.968

136 rows × 10 columns

x_test
length danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
92 235466 0.661 0.921000 0.151 0.000000 0.1300 -12.864 0.0354 0.230 94.922
113 231000 0.688 0.481000 0.653 0.004140 0.1060 -8.558 0.0403 0.701 147.991
19 403887 0.583 0.183000 0.468 0.000002 0.1110 -5.378 0.0278 0.126 119.375
69 250093 0.481 0.678000 0.435 0.000000 0.0928 -8.795 0.0321 0.107 143.950
53 286613 0.619 0.187000 0.506 0.000015 0.1010 -7.327 0.0315 0.274 126.030
161 237338 0.476 0.040600 0.564 0.000000 0.1020 -5.677 0.0269 0.167 143.929
108 193000 0.610 0.033000 0.830 0.000000 0.1180 -4.105 0.0571 0.728 182.162
14 179066 0.459 0.040200 0.753 0.000000 0.0863 -3.827 0.0537 0.483 199.997
99 234146 0.662 0.028000 0.747 0.006150 0.1380 -6.926 0.0736 0.487 150.088
107 223293 0.756 0.130000 0.449 0.000000 0.1140 -8.746 0.0344 0.399 111.011
11 213053 0.563 0.004520 0.934 0.000807 0.1030 -3.629 0.0646 0.518 143.964
4 239013 0.418 0.217000 0.482 0.000000 0.1230 -5.769 0.0266 0.261 175.558
117 208906 0.602 0.888000 0.494 0.000026 0.0902 -10.813 0.0277 0.374 94.955
42 232120 0.661 0.002150 0.729 0.001300 0.0477 -6.561 0.0376 0.668 103.987
122 237266 0.593 0.670000 0.700 0.000007 0.1160 -9.016 0.0492 0.451 141.898
125 234000 0.644 0.916000 0.284 0.000015 0.0909 -12.879 0.0821 0.328 150.072
147 235766 0.627 0.130000 0.792 0.000004 0.0845 -4.311 0.0310 0.415 119.054
35 293040 0.525 0.113000 0.676 0.000000 0.2940 -4.684 0.0294 0.281 141.950
81 236413 0.615 0.106000 0.534 0.000018 0.0607 -6.719 0.0386 0.193 135.917
31 276266 0.582 0.018700 0.817 0.000002 0.1010 -3.718 0.0337 0.547 131.983
51 220600 0.649 0.021300 0.777 0.000335 0.2100 -5.804 0.0406 0.587 126.018
75 216333 0.592 0.829000 0.128 0.000000 0.5270 -17.932 0.5890 0.150 78.828
78 208186 0.613 0.052700 0.764 0.000000 0.1970 -6.509 0.1360 0.417 160.015
73 267106 0.474 0.707000 0.480 0.000108 0.0903 -8.894 0.0622 0.319 170.109
40 219720 0.622 0.004540 0.469 0.000002 0.0335 -6.798 0.0363 0.679 77.019
84 227906 0.574 0.122000 0.610 0.000001 0.1300 -7.283 0.0732 0.374 74.957
47 202960 0.627 0.016200 0.816 0.002080 0.0965 -6.698 0.0774 0.648 157.043
29 238173 0.610 0.505000 0.556 0.000000 0.0851 -7.369 0.0285 0.192 159.838
16 260946 0.608 0.038700 0.785 0.000000 0.1580 -2.976 0.0311 0.376 114.985
105 200306 0.739 0.736000 0.320 0.000147 0.1110 -10.862 0.2390 0.351 79.970
85 209680 0.800 0.071300 0.535 0.000009 0.2130 -6.684 0.1350 0.451 92.027
154 243136 0.402 0.003300 0.732 0.000000 0.1080 -4.665 0.0484 0.472 161.032
157 279359 0.499 0.000191 0.815 0.000000 0.1810 -4.063 0.0341 0.344 95.999
5 207106 0.589 0.004910 0.805 0.000000 0.2400 -4.055 0.0293 0.591 112.982
94 178426 0.552 0.117000 0.702 0.000021 0.1050 -5.707 0.1570 0.564 169.994
y_train
89     71
88     68
165    63
110    68
48     59
       ..
133    65
137    65
72     65
140    63
37     62
Name: popularity, Length: 136, dtype: int64
y_test
92     67
113    63
19     48
69     60
53     58
161    61
108    77
14     48
99     70
107    74
11     50
4      50
117    62
42     64
122    61
125    60
147    74
35     43
81     75
31     47
51     58
75      0
78     74
73     62
40     65
84     66
47     61
29     55
16     50
105    68
85     72
154    74
157    61
5      47
94     77
Name: popularity, dtype: int64

训练模型

from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=100, random_state=27)
regressor.fit(x_train, y_train)
RandomForestRegressor(random_state=27)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.

计算准确率

regressor.score(x_test,y_test)
0.41401740646745133

查看预测值和实际值

compare = pd.DataFrame(y_test)
compare.columns = ['actual']
compare
actual
92 67
113 63
19 48
69 60
53 58
161 61
108 77
14 48
99 70
107 74
11 50
4 50
117 62
42 64
122 61
125 60
147 74
35 43
81 75
31 47
51 58
75 0
78 74
73 62
40 65
84 66
47 61
29 55
16 50
105 68
85 72
154 74
157 61
5 47
94 77
y_pred = regressor.predict(x_test)

compare['predict'] = y_pred.round(0)
compare
actual predict
92 67 60.0
113 63 66.0
19 48 55.0
69 60 63.0
53 58 61.0
161 61 53.0
108 77 59.0
14 48 60.0
99 70 66.0
107 74 61.0
11 50 55.0
4 50 54.0
117 62 63.0
42 64 68.0
122 61 65.0
125 60 68.0
147 74 57.0
35 43 55.0
81 75 68.0
31 47 56.0
51 58 64.0
75 0 43.0
78 74 69.0
73 62 67.0
40 65 65.0
84 66 68.0
47 61 67.0
29 55 62.0
16 50 57.0
105 68 68.0
85 72 72.0
154 74 60.0
157 61 58.0
5 47 56.0
94 77 67.0

posted @ 2023-06-06 07:58  dogfaraway  阅读(54)  评论(0编辑  收藏  举报