霉霉数据分析——基于pandas库
背景
摘录《Taylor Swift The Whole Story》
目的
我们尝试通过taylor swift的数据集分析人气单曲、具有舞蹈性和活力的单曲、各个特征的相关程度、专辑流行度。
变量描述
popularity:流行度
Danceability: 舞蹈性
Acousticness: 原声,值越高,歌曲的原声性越强对应为不插电
Energy: 歌曲的能量
instrumentalness:乐器表现
liveness:现场感
Loudness: 响度,值越高,歌曲越响亮
Speechiness: 口语化,值越高,歌词越口语化
Valence: 效价(情绪),值越高,这首歌的情绪就越积极正能量
tempo:节奏感
读入数据
import numpy as np
import pandas as pd
tay = pd.read_csv('spotify_taylorswift.csv' )
tay.head()
Unnamed: 0
name
album
artist
release_date
length
popularity
danceability
acousticness
energy
instrumentalness
liveness
loudness
speechiness
valence
tempo
0
0
Tim McGraw
Taylor Swift
Taylor Swift
2006-10-24
232106
49
0.580
0.575
0.491
0.0
0.1210
-6.462
0.0251
0.425
76.009
1
1
Picture To Burn
Taylor Swift
Taylor Swift
2006-10-24
173066
54
0.658
0.173
0.877
0.0
0.0962
-2.098
0.0323
0.821
105.586
2
2
Teardrops On My Guitar - Radio Single Remix
Taylor Swift
Taylor Swift
2006-10-24
203040
59
0.621
0.288
0.417
0.0
0.1190
-6.941
0.0231
0.289
99.953
3
3
A Place in this World
Taylor Swift
Taylor Swift
2006-10-24
199200
49
0.576
0.051
0.777
0.0
0.3200
-2.881
0.0324
0.428
115.028
4
4
Cold As You
Taylor Swift
Taylor Swift
2006-10-24
239013
50
0.418
0.217
0.482
0.0
0.1230
-5.769
0.0266
0.261
175.558
数据预处理
删除第一列索引
del tay[tay.keys()[0 ]]
tay.head()
name
album
artist
release_date
length
popularity
danceability
acousticness
energy
instrumentalness
liveness
loudness
speechiness
valence
tempo
0
Tim McGraw
Taylor Swift
Taylor Swift
2006-10-24
232106
49
0.580
0.575
0.491
0.0
0.1210
-6.462
0.0251
0.425
76.009
1
Picture To Burn
Taylor Swift
Taylor Swift
2006-10-24
173066
54
0.658
0.173
0.877
0.0
0.0962
-2.098
0.0323
0.821
105.586
2
Teardrops On My Guitar - Radio Single Remix
Taylor Swift
Taylor Swift
2006-10-24
203040
59
0.621
0.288
0.417
0.0
0.1190
-6.941
0.0231
0.289
99.953
3
A Place in this World
Taylor Swift
Taylor Swift
2006-10-24
199200
49
0.576
0.051
0.777
0.0
0.3200
-2.881
0.0324
0.428
115.028
4
Cold As You
Taylor Swift
Taylor Swift
2006-10-24
239013
50
0.418
0.217
0.482
0.0
0.1230
-5.769
0.0266
0.261
175.558
删除空值的样本
<class 'pandas .core .frame .DataFrame '>
RangeIndex: 171 entries, 0 to 170
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 171 non-null object
1 album 171 non-null object
2 artist 171 non-null object
3 release_date 171 non-null object
4 length 171 non-null int64
5 popularity 171 non-null int64
6 danceability 171 non-null float64
7 acousticness 171 non-null float64
8 energy 171 non-null float64
9 instrumentalness 171 non-null float64
10 liveness 171 non-null float64
11 loudness 171 non-null float64
12 speechiness 171 non-null float64
13 valence 171 non-null float64
14 tempo 171 non-null float64
dtypes: float64(9 ), int64(2 ), object (4 )
memory usage: 20.2 + KB
tay = tay.dropna()
tay.info()
<class 'pandas .core .frame .DataFrame '>
RangeIndex: 171 entries, 0 to 170
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 171 non-null object
1 album 171 non-null object
2 artist 171 non-null object
3 release_date 171 non-null object
4 length 171 non-null int64
5 popularity 171 non-null int64
6 danceability 171 non-null float64
7 acousticness 171 non-null float64
8 energy 171 non-null float64
9 instrumentalness 171 non-null float64
10 liveness 171 non-null float64
11 loudness 171 non-null float64
12 speechiness 171 non-null float64
13 valence 171 non-null float64
14 tempo 171 non-null float64
dtypes: float64(9 ), int64(2 ), object (4 )
memory usage: 20.2 + KB
人气排序
populars = tay.sort_values(by='popularity' , ascending=False )
populars.head(13 )
name
album
artist
release_date
length
popularity
danceability
acousticness
energy
instrumentalness
liveness
loudness
speechiness
valence
tempo
60
Blank Space
1989 (Deluxe)
Taylor Swift
2014-01-01
231826
82
0.760
0.10300
0.703
0.000000
0.0913
-5.412
0.0540
0.5700
95.997
64
Shake It Off
1989 (Deluxe)
Taylor Swift
2014-01-01
219200
80
0.647
0.06470
0.800
0.000000
0.3340
-5.384
0.1650
0.9420
160.078
95
Lover
Lover
Taylor Swift
2019-08-23
221306
80
0.359
0.49200
0.543
0.000016
0.1180
-7.582
0.0919
0.4530
68.534
82
Delicate
reputation
Taylor Swift
2017-11-10
232253
78
0.750
0.21600
0.404
0.000357
0.0911
-10.178
0.0682
0.0499
95.045
106
You Need To Calm Down
Lover
Taylor Swift
2019-08-23
171360
78
0.771
0.00929
0.671
0.000000
0.0637
-5.617
0.0553
0.7140
85.026
94
Cruel Summer
Lover
Taylor Swift
2019-08-23
178426
77
0.552
0.11700
0.702
0.000021
0.1050
-5.707
0.1570
0.5640
169.994
108
ME! (feat. Brendon Urie of Panic! At The Disco)
Lover
Taylor Swift
2019-08-23
193000
77
0.610
0.03300
0.830
0.000000
0.1180
-4.105
0.0571
0.7280
182.162
83
Look What You Made Me Do
reputation
Taylor Swift
2017-11-10
211853
77
0.766
0.20400
0.709
0.000014
0.1260
-6.471
0.1230
0.5060
128.070
86
Getaway Car
reputation
Taylor Swift
2017-11-10
233626
76
0.562
0.00465
0.689
0.000002
0.0888
-6.745
0.1270
0.3510
172.054
150
You Belong With Me (Taylor’s Version)
Fearless (Taylor's Version)
Taylor Swift
2021-04-09
231124
76
0.632
0.06230
0.773
0.000000
0.0885
-4.856
0.0346
0.4740
130.033
100
Paper Rings
Lover
Taylor Swift
2019-08-23
222400
76
0.811
0.01290
0.719
0.000014
0.0742
-6.553
0.0497
0.8650
103.979
81
Don’t Blame Me
reputation
Taylor Swift
2017-11-10
236413
75
0.615
0.10600
0.534
0.000018
0.0607
-6.719
0.0386
0.1930
135.917
61
Style
1989 (Deluxe)
Taylor Swift
2014-01-01
231000
75
0.588
0.00245
0.791
0.002580
0.1180
-5.595
0.0402
0.4870
94.933
提取 TOP 5 单曲
top_5 = populars.head()
top_5.loc[:,['name' ,'popularity' ]]
name
popularity
60
Blank Space
82
64
Shake It Off
80
95
Lover
80
82
Delicate
78
106
You Need To Calm Down
78
提取舞蹈性较高的单曲
dance_on = tay.loc[(tay.danceability >= 0.75 )]
dance_on.loc[:,['name' ,'album' ,'danceability' ]].head(13 )
name
album
danceability
56
Treacherous - Original Demo Recording
Red (Deluxe Edition)
0.828
59
Welcome To New York
1989 (Deluxe)
0.789
60
Blank Space
1989 (Deluxe)
0.760
68
How You Get The Girl
1989 (Deluxe)
0.765
71
Clean
1989 (Deluxe)
0.815
76
I Wish You Would - Voice Memo
1989 (Deluxe)
0.781
82
Delicate
reputation
0.750
83
Look What You Made Me Do
reputation
0.766
85
Gorgeous
reputation
0.800
96
The Man
Lover
0.777
98
I Think He Knows
Lover
0.897
100
Paper Rings
Lover
0.811
101
Cornelia Street
Lover
0.824
提取活力值较高的单曲
energy = tay.loc[(tay.energy >= 0.5 )]
energy = tay.sort_values(by='energy' , ascending=False )
energy.loc[:,['name' ,'album' ,'energy' ]].head(13 )
name
album
energy
26
Haunted
Speak Now (Deluxe Package)
0.944
11
I'm Only Me When I'm With You
Taylor Swift
0.934
24
Better Than Revenge
Speak Now (Deluxe Package)
0.917
152
Tell Me Why (Taylor’s Version)
Fearless (Taylor's Version)
0.909
57
Red - Original Demo Recording
Red (Deluxe Edition)
0.902
38
Red
Red (Deluxe Edition)
0.896
65
I Wish You Would
1989 (Deluxe)
0.893
74
New Romantics
1989 (Deluxe)
0.889
1
Picture To Burn
Taylor Swift
0.877
163
The Other Side Of The Door (Taylor’s Version)
Fearless (Taylor's Version)
0.873
21
The Story Of Us
Speak Now (Deluxe Package)
0.855
62
Out Of The Woods
1989 (Deluxe)
0.841
108
ME! (feat. Brendon Urie of Panic! At The Disco)
Lover
0.830
查询你最喜欢的专辑
比如,我喜欢《Speak Now》 😃
tay.loc[tay.album == 'Speak Now (Deluxe Package)' ]
name
album
artist
release_date
length
popularity
danceability
acousticness
energy
instrumentalness
liveness
loudness
speechiness
valence
tempo
15
Mine - POP Mix
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
230546
45
0.696
0.004610
0.768
0.000001
0.1010
-3.863
0.0308
0.692
121.050
16
Sparks Fly
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
260946
50
0.608
0.038700
0.785
0.000000
0.1580
-2.976
0.0311
0.376
114.985
17
Back To December
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
293040
50
0.517
0.020200
0.606
0.000000
0.3240
-5.797
0.0289
0.296
141.929
18
Speak Now
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
240773
49
0.708
0.101000
0.601
0.000000
0.0979
-3.750
0.0306
0.742
118.962
19
Dear John
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
403887
48
0.583
0.183000
0.468
0.000002
0.1110
-5.378
0.0278
0.126
119.375
20
Mean
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
237746
48
0.570
0.445000
0.747
0.000000
0.2190
-3.978
0.0426
0.808
164.004
21
The Story Of Us
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
265636
48
0.575
0.000315
0.855
0.001610
0.0419
-4.827
0.0467
0.840
139.920
22
Never Grow Up
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
290480
44
0.715
0.829000
0.308
0.000000
0.1600
-8.829
0.0305
0.547
124.899
23
Enchanted
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
352200
51
0.535
0.071600
0.618
0.000388
0.1690
-3.913
0.0273
0.228
81.975
24
Better Than Revenge
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
217173
49
0.519
0.016700
0.917
0.000021
0.3590
-3.185
0.0887
0.652
145.882
25
Innocent
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
302266
44
0.553
0.202000
0.604
0.000000
0.1250
-5.295
0.0258
0.186
133.989
26
Haunted
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
242093
47
0.434
0.076900
0.944
0.000000
0.1510
-2.641
0.0581
0.361
162.020
27
Last Kiss
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
367146
47
0.359
0.581000
0.329
0.000037
0.0979
-9.531
0.0293
0.208
84.358
28
Long Live
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
317960
47
0.412
0.042600
0.682
0.000075
0.1060
-4.319
0.0339
0.146
203.959
29
Ours
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
238173
55
0.610
0.505000
0.556
0.000000
0.0851
-7.369
0.0285
0.192
159.838
30
If This Was A Movie
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
234546
52
0.515
0.154000
0.724
0.000004
0.2690
-3.498
0.0267
0.257
147.788
31
Superman
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
276266
47
0.582
0.018700
0.817
0.000002
0.1010
-3.718
0.0337
0.547
131.983
32
Back To December - Acoustic
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
292533
59
0.541
0.731000
0.451
0.000000
0.1970
-6.522
0.0270
0.333
141.713
33
Haunted - Acoustic Version
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
217626
48
0.574
0.841000
0.462
0.000000
0.2800
-5.124
0.0252
0.314
80.858
34
Mine
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
230773
64
0.621
0.003270
0.780
0.000005
0.1840
-2.934
0.0297
0.672
121.038
35
Back To December
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
293040
43
0.525
0.113000
0.676
0.000000
0.2940
-4.684
0.0294
0.281
141.950
36
The Story Of Us
Speak Now (Deluxe Package)
Taylor Swift
2010-01-01
266480
59
0.546
0.004870
0.809
0.000372
0.0437
-3.621
0.0410
0.649
139.910
分析
查看专辑流行度
album = ["Taylor Swift" ,"Fearless" ,"Speak Now" ,"Red" ,"1989" ,"Reputation" ,"Lover" ,"Folklore" ,"Evermore" ]
maxpopularity = [59 ,76 ,64 ,72 ,82 ,78 ,80 ,65 ,72 ]
newcolors = ['lightblue' ,'gold' ,'purple' ,'red' ,'tan' ,'black' ,'pink' ,'grey' ,'brown' ]
plt.figure( figsize = (8 ,6 ))
plt.bar(album, maxpopularity,color = newcolors)
plt.title(' Popularity of albums' )
plt.xlabel('ALBUM' )
plt.ylabel('POPULARITY' )
plt.xticks(fontsize=6 )
plt.show()
流行度直方图
sns.displot(x='popularity' , data=tay, kde=True , color='#a70ad5' )
plt.title('Popularity Distribution' )
Text(0.5 , 1.0 , 'Popularity Distribution')
以活力值和舞蹈性为例
import matplotlib.pyplot as plt
import numpy as np
energyy = tay.energy
dancee = tay.danceability
correlation_length = energyy.corr(dancee)
print (correlation_length)
ax1 = tay.plot.scatter(x = 'energy' ,y = 'danceability' ,c = 'red' )
Energy 和 Acousticness的相关性
sns.set_theme(color_codes=True )
plt.figure(figsize=(18 , 6 ))
sns.regplot(x='acousticness' , y='energy' , data=tay)
plt.title('Acousticness vs Energy' )
Text(0.5 , 1.0 , 'Acousticness vs Energy')
Energy 和 Loudness相关性
sns.set_theme(color_codes=True )
plt.figure(figsize=(18 , 6 ))
sns.regplot(x='energy' , y='loudness' , data=tay)
plt.title('Energy vs Loudness' )
Text(0.5 , 1.0 , 'Energy vs Loudness')
相关系数矩阵
C: \Users\dogfa\AppData\Local\Temp\ipykernel_10964\3121711199 .py:1 : FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False . Select only valid columns or specify the value of numeric_only to silence this warning.
tay.corr()
length
popularity
danceability
acousticness
energy
instrumentalness
liveness
loudness
speechiness
valence
tempo
length
1.000000
0.011772
-0.301561
0.038749
-0.114792
-0.081288
-0.148412
0.044126
-0.414447
-0.420405
0.010425
popularity
0.011772
1.000000
0.072622
-0.117842
0.127495
0.035638
-0.406730
0.122576
-0.478262
0.034154
-0.015669
danceability
-0.301561
0.072622
1.000000
-0.143085
0.062669
-0.051837
-0.015766
0.002587
0.183860
0.379786
-0.235370
acousticness
0.038749
-0.117842
-0.143085
1.000000
-0.710055
0.140655
-0.065387
-0.736624
0.143127
-0.231232
-0.134467
energy
-0.114792
0.127495
0.062669
-0.710055
1.000000
0.000281
0.046364
0.784973
-0.179336
0.490371
0.209914
instrumentalness
-0.081288
0.035638
-0.051837
0.140655
0.000281
1.000000
-0.059132
-0.084224
-0.029729
0.020076
0.043274
liveness
-0.148412
-0.406730
-0.015766
-0.065387
0.046364
-0.059132
1.000000
0.016324
0.357924
-0.017264
0.034934
loudness
0.044126
0.122576
0.002587
-0.736624
0.784973
-0.084224
0.016324
1.000000
-0.409577
0.299926
0.171503
speechiness
-0.414447
-0.478262
0.183860
0.143127
-0.179336
-0.029729
0.357924
-0.409577
1.000000
0.120352
-0.027812
valence
-0.420405
0.034154
0.379786
-0.231232
0.490371
0.020076
-0.017264
0.299926
0.120352
1.000000
-0.006056
tempo
0.010425
-0.015669
-0.235370
-0.134467
0.209914
0.043274
0.034934
0.171503
-0.027812
-0.006056
1.000000
相关系数矩阵图
import seaborn as sns
import matplotlib.pyplot as plt
C: \Users\dogfa\AppData\Local\Temp\ipykernel_10964\54934055 .py:1 : FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False . Select only valid columns or specify the value of numeric_only to silence this warning.
sns.heatmap(tay.corr())
<Axes: >
相关性矩阵图
<seaborn. axisgrid. PairGrid at 0x29d8b437b50 >
Random Forest Regression
算法原理
随机森林中的每棵树在称为自助聚集 (bagging) 的过程中随机对训练数据子集进行抽样。该模型适合这些较小的数据集,并汇总预测结果。通过有放回抽样,可以重复使用同一数据的几个实例,结果就是,这些树不仅基于不同的数据集进行训练,而且还使用不同的特性做出决策。
整理数据
dataset = tay.drop(['album' ,'artist' ,'release_date' ,'name' ], axis=1 )
dataset
length
popularity
danceability
acousticness
energy
instrumentalness
liveness
loudness
speechiness
valence
tempo
0
232106
49
0.580
0.575
0.491
0.000000
0.1210
-6.462
0.0251
0.425
76.009
1
173066
54
0.658
0.173
0.877
0.000000
0.0962
-2.098
0.0323
0.821
105.586
2
203040
59
0.621
0.288
0.417
0.000000
0.1190
-6.941
0.0231
0.289
99.953
3
199200
49
0.576
0.051
0.777
0.000000
0.3200
-2.881
0.0324
0.428
115.028
4
239013
50
0.418
0.217
0.482
0.000000
0.1230
-5.769
0.0266
0.261
175.558
...
...
...
...
...
...
...
...
...
...
...
...
166
277591
74
0.660
0.162
0.817
0.000000
0.0667
-6.269
0.0521
0.714
135.942
167
244236
65
0.609
0.849
0.373
0.000000
0.0779
-8.819
0.0263
0.130
106.007
168
189495
67
0.588
0.225
0.608
0.000000
0.0920
-7.062
0.0365
0.508
90.201
169
208608
66
0.563
0.514
0.473
0.000012
0.1090
-11.548
0.0503
0.405
101.934
170
242157
64
0.624
0.334
0.624
0.000000
0.0995
-7.860
0.0539
0.527
80.132
171 rows × 11 columns
x = dataset.drop(['popularity' ],axis=1 )
x
length
danceability
acousticness
energy
instrumentalness
liveness
loudness
speechiness
valence
tempo
0
232106
0.580
0.575
0.491
0.000000
0.1210
-6.462
0.0251
0.425
76.009
1
173066
0.658
0.173
0.877
0.000000
0.0962
-2.098
0.0323
0.821
105.586
2
203040
0.621
0.288
0.417
0.000000
0.1190
-6.941
0.0231
0.289
99.953
3
199200
0.576
0.051
0.777
0.000000
0.3200
-2.881
0.0324
0.428
115.028
4
239013
0.418
0.217
0.482
0.000000
0.1230
-5.769
0.0266
0.261
175.558
...
...
...
...
...
...
...
...
...
...
...
166
277591
0.660
0.162
0.817
0.000000
0.0667
-6.269
0.0521
0.714
135.942
167
244236
0.609
0.849
0.373
0.000000
0.0779
-8.819
0.0263
0.130
106.007
168
189495
0.588
0.225
0.608
0.000000
0.0920
-7.062
0.0365
0.508
90.201
169
208608
0.563
0.514
0.473
0.000012
0.1090
-11.548
0.0503
0.405
101.934
170
242157
0.624
0.334
0.624
0.000000
0.0995
-7.860
0.0539
0.527
80.132
171 rows × 10 columns
y = dataset['popularity' ]
y
0 49
1 54
2 59
3 49
4 50
..
166 74
167 65
168 67
169 66
170 64
Name: popularity, Length: 171 , dtype: int64
划分train test
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2 ,random_state=1 )
length
danceability
acousticness
energy
instrumentalness
liveness
loudness
speechiness
valence
tempo
89
230373
0.719
0.032900
0.469
0.000000
0.1690
-8.792
0.0533
0.0851
120.085
88
211506
0.624
0.060400
0.691
0.000011
0.1380
-6.686
0.1960
0.2840
160.024
165
220839
0.599
0.816000
0.494
0.000000
0.1010
-7.610
0.0372
0.4400
142.893
110
293453
0.557
0.808000
0.496
0.000173
0.0772
-9.602
0.0563
0.2650
149.983
48
284866
0.624
0.632000
0.340
0.033700
0.0805
-12.411
0.0290
0.2610
129.987
...
...
...
...
...
...
...
...
...
...
...
133
215626
0.546
0.418000
0.613
0.000000
0.1030
-7.589
0.0264
0.5350
79.015
137
260440
0.515
0.855000
0.545
0.000020
0.0921
-9.277
0.0353
0.5350
88.856
72
245560
0.422
0.049300
0.692
0.000026
0.1770
-5.447
0.0549
0.1970
184.014
140
257773
0.535
0.876000
0.561
0.000136
0.1150
-11.609
0.0484
0.2870
96.103
37
295720
0.588
0.000197
0.825
0.001380
0.0884
-5.882
0.0328
0.3970
129.968
136 rows × 10 columns
length
danceability
acousticness
energy
instrumentalness
liveness
loudness
speechiness
valence
tempo
92
235466
0.661
0.921000
0.151
0.000000
0.1300
-12.864
0.0354
0.230
94.922
113
231000
0.688
0.481000
0.653
0.004140
0.1060
-8.558
0.0403
0.701
147.991
19
403887
0.583
0.183000
0.468
0.000002
0.1110
-5.378
0.0278
0.126
119.375
69
250093
0.481
0.678000
0.435
0.000000
0.0928
-8.795
0.0321
0.107
143.950
53
286613
0.619
0.187000
0.506
0.000015
0.1010
-7.327
0.0315
0.274
126.030
161
237338
0.476
0.040600
0.564
0.000000
0.1020
-5.677
0.0269
0.167
143.929
108
193000
0.610
0.033000
0.830
0.000000
0.1180
-4.105
0.0571
0.728
182.162
14
179066
0.459
0.040200
0.753
0.000000
0.0863
-3.827
0.0537
0.483
199.997
99
234146
0.662
0.028000
0.747
0.006150
0.1380
-6.926
0.0736
0.487
150.088
107
223293
0.756
0.130000
0.449
0.000000
0.1140
-8.746
0.0344
0.399
111.011
11
213053
0.563
0.004520
0.934
0.000807
0.1030
-3.629
0.0646
0.518
143.964
4
239013
0.418
0.217000
0.482
0.000000
0.1230
-5.769
0.0266
0.261
175.558
117
208906
0.602
0.888000
0.494
0.000026
0.0902
-10.813
0.0277
0.374
94.955
42
232120
0.661
0.002150
0.729
0.001300
0.0477
-6.561
0.0376
0.668
103.987
122
237266
0.593
0.670000
0.700
0.000007
0.1160
-9.016
0.0492
0.451
141.898
125
234000
0.644
0.916000
0.284
0.000015
0.0909
-12.879
0.0821
0.328
150.072
147
235766
0.627
0.130000
0.792
0.000004
0.0845
-4.311
0.0310
0.415
119.054
35
293040
0.525
0.113000
0.676
0.000000
0.2940
-4.684
0.0294
0.281
141.950
81
236413
0.615
0.106000
0.534
0.000018
0.0607
-6.719
0.0386
0.193
135.917
31
276266
0.582
0.018700
0.817
0.000002
0.1010
-3.718
0.0337
0.547
131.983
51
220600
0.649
0.021300
0.777
0.000335
0.2100
-5.804
0.0406
0.587
126.018
75
216333
0.592
0.829000
0.128
0.000000
0.5270
-17.932
0.5890
0.150
78.828
78
208186
0.613
0.052700
0.764
0.000000
0.1970
-6.509
0.1360
0.417
160.015
73
267106
0.474
0.707000
0.480
0.000108
0.0903
-8.894
0.0622
0.319
170.109
40
219720
0.622
0.004540
0.469
0.000002
0.0335
-6.798
0.0363
0.679
77.019
84
227906
0.574
0.122000
0.610
0.000001
0.1300
-7.283
0.0732
0.374
74.957
47
202960
0.627
0.016200
0.816
0.002080
0.0965
-6.698
0.0774
0.648
157.043
29
238173
0.610
0.505000
0.556
0.000000
0.0851
-7.369
0.0285
0.192
159.838
16
260946
0.608
0.038700
0.785
0.000000
0.1580
-2.976
0.0311
0.376
114.985
105
200306
0.739
0.736000
0.320
0.000147
0.1110
-10.862
0.2390
0.351
79.970
85
209680
0.800
0.071300
0.535
0.000009
0.2130
-6.684
0.1350
0.451
92.027
154
243136
0.402
0.003300
0.732
0.000000
0.1080
-4.665
0.0484
0.472
161.032
157
279359
0.499
0.000191
0.815
0.000000
0.1810
-4.063
0.0341
0.344
95.999
5
207106
0.589
0.004910
0.805
0.000000
0.2400
-4.055
0.0293
0.591
112.982
94
178426
0.552
0.117000
0.702
0.000021
0.1050
-5.707
0.1570
0.564
169.994
89 71
88 68
165 63
110 68
48 59
..
133 65
137 65
72 65
140 63
37 62
Name: popularity, Length: 136 , dtype: int64
92 67
113 63
19 48
69 60
53 58
161 61
108 77
14 48
99 70
107 74
11 50
4 50
117 62
42 64
122 61
125 60
147 74
35 43
81 75
31 47
51 58
75 0
78 74
73 62
40 65
84 66
47 61
29 55
16 50
105 68
85 72
154 74
157 61
5 47
94 77
Name: popularity, dtype: int64
训练模型
from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=100 , random_state=27 )
regressor.fit(x_train, y_train)
RandomForestRegressor(random_state=27) In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.
计算准确率
regressor.score(x_test,y_test)
查看预测值和实际值
compare = pd.DataFrame(y_test)
compare.columns = ['actual' ]
compare
actual
92
67
113
63
19
48
69
60
53
58
161
61
108
77
14
48
99
70
107
74
11
50
4
50
117
62
42
64
122
61
125
60
147
74
35
43
81
75
31
47
51
58
75
0
78
74
73
62
40
65
84
66
47
61
29
55
16
50
105
68
85
72
154
74
157
61
5
47
94
77
y_pred = regressor.predict(x_test)
compare['predict' ] = y_pred.round (0 )
compare
actual
predict
92
67
60.0
113
63
66.0
19
48
55.0
69
60
63.0
53
58
61.0
161
61
53.0
108
77
59.0
14
48
60.0
99
70
66.0
107
74
61.0
11
50
55.0
4
50
54.0
117
62
63.0
42
64
68.0
122
61
65.0
125
60
68.0
147
74
57.0
35
43
55.0
81
75
68.0
31
47
56.0
51
58
64.0
75
0
43.0
78
74
69.0
73
62
67.0
40
65
65.0
84
66
68.0
47
61
67.0
29
55
62.0
16
50
57.0
105
68
68.0
85
72
72.0
154
74
60.0
157
61
58.0
5
47
56.0
94
77
67.0
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· winform 绘制太阳,地球,月球 运作规律
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人