











摘录《Taylor Swift The Whole Story》






我们尝试通过taylor swift的数据集分析人气单曲、具有舞蹈性和活力的单曲、各个特征的相关程度、专辑流行度。



Danceability: 舞蹈性

Acousticness: 原声,值越高,歌曲的原声性越强对应为不插电

Energy: 歌曲的能量



Loudness: 响度,值越高,歌曲越响亮

Speechiness: 口语化,值越高,歌词越口语化

Valence: 效价(情绪),值越高,这首歌的情绪就越积极正能量



import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
tay = pd.read_csv('spotify_taylorswift.csv')
Unnamed: 0 name album artist release_date length popularity danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
0 0 Tim McGraw Taylor Swift Taylor Swift 2006-10-24 232106 49 0.580 0.575 0.491 0.0 0.1210 -6.462 0.0251 0.425 76.009
1 1 Picture To Burn Taylor Swift Taylor Swift 2006-10-24 173066 54 0.658 0.173 0.877 0.0 0.0962 -2.098 0.0323 0.821 105.586
2 2 Teardrops On My Guitar - Radio Single Remix Taylor Swift Taylor Swift 2006-10-24 203040 59 0.621 0.288 0.417 0.0 0.1190 -6.941 0.0231 0.289 99.953
3 3 A Place in this World Taylor Swift Taylor Swift 2006-10-24 199200 49 0.576 0.051 0.777 0.0 0.3200 -2.881 0.0324 0.428 115.028
4 4 Cold As You Taylor Swift Taylor Swift 2006-10-24 239013 50 0.418 0.217 0.482 0.0 0.1230 -5.769 0.0266 0.261 175.558



del tay[tay.keys()[0]]
name album artist release_date length popularity danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
0 Tim McGraw Taylor Swift Taylor Swift 2006-10-24 232106 49 0.580 0.575 0.491 0.0 0.1210 -6.462 0.0251 0.425 76.009
1 Picture To Burn Taylor Swift Taylor Swift 2006-10-24 173066 54 0.658 0.173 0.877 0.0 0.0962 -2.098 0.0323 0.821 105.586
2 Teardrops On My Guitar - Radio Single Remix Taylor Swift Taylor Swift 2006-10-24 203040 59 0.621 0.288 0.417 0.0 0.1190 -6.941 0.0231 0.289 99.953
3 A Place in this World Taylor Swift Taylor Swift 2006-10-24 199200 49 0.576 0.051 0.777 0.0 0.3200 -2.881 0.0324 0.428 115.028
4 Cold As You Taylor Swift Taylor Swift 2006-10-24 239013 50 0.418 0.217 0.482 0.0 0.1230 -5.769 0.0266 0.261 175.558


<class 'pandas.core.frame.DataFrame'>
RangeIndex: 171 entries, 0 to 170
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 171 non-null object
1 album 171 non-null object
2 artist 171 non-null object
3 release_date 171 non-null object
4 length 171 non-null int64
5 popularity 171 non-null int64
6 danceability 171 non-null float64
7 acousticness 171 non-null float64
8 energy 171 non-null float64
9 instrumentalness 171 non-null float64
10 liveness 171 non-null float64
11 loudness 171 non-null float64
12 speechiness 171 non-null float64
13 valence 171 non-null float64
14 tempo 171 non-null float64
dtypes: float64(9), int64(2), object(4)
memory usage: 20.2+ KB
tay = tay.dropna()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 171 entries, 0 to 170
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 name 171 non-null object
1 album 171 non-null object
2 artist 171 non-null object
3 release_date 171 non-null object
4 length 171 non-null int64
5 popularity 171 non-null int64
6 danceability 171 non-null float64
7 acousticness 171 non-null float64
8 energy 171 non-null float64
9 instrumentalness 171 non-null float64
10 liveness 171 non-null float64
11 loudness 171 non-null float64
12 speechiness 171 non-null float64
13 valence 171 non-null float64
14 tempo 171 non-null float64
dtypes: float64(9), int64(2), object(4)
memory usage: 20.2+ KB


populars = tay.sort_values(by='popularity', ascending=False)
name album artist release_date length popularity danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
60 Blank Space 1989 (Deluxe) Taylor Swift 2014-01-01 231826 82 0.760 0.10300 0.703 0.000000 0.0913 -5.412 0.0540 0.5700 95.997
64 Shake It Off 1989 (Deluxe) Taylor Swift 2014-01-01 219200 80 0.647 0.06470 0.800 0.000000 0.3340 -5.384 0.1650 0.9420 160.078
95 Lover Lover Taylor Swift 2019-08-23 221306 80 0.359 0.49200 0.543 0.000016 0.1180 -7.582 0.0919 0.4530 68.534
82 Delicate reputation Taylor Swift 2017-11-10 232253 78 0.750 0.21600 0.404 0.000357 0.0911 -10.178 0.0682 0.0499 95.045
106 You Need To Calm Down Lover Taylor Swift 2019-08-23 171360 78 0.771 0.00929 0.671 0.000000 0.0637 -5.617 0.0553 0.7140 85.026
94 Cruel Summer Lover Taylor Swift 2019-08-23 178426 77 0.552 0.11700 0.702 0.000021 0.1050 -5.707 0.1570 0.5640 169.994
108 ME! (feat. Brendon Urie of Panic! At The Disco) Lover Taylor Swift 2019-08-23 193000 77 0.610 0.03300 0.830 0.000000 0.1180 -4.105 0.0571 0.7280 182.162
83 Look What You Made Me Do reputation Taylor Swift 2017-11-10 211853 77 0.766 0.20400 0.709 0.000014 0.1260 -6.471 0.1230 0.5060 128.070
86 Getaway Car reputation Taylor Swift 2017-11-10 233626 76 0.562 0.00465 0.689 0.000002 0.0888 -6.745 0.1270 0.3510 172.054
150 You Belong With Me (Taylor’s Version) Fearless (Taylor's Version) Taylor Swift 2021-04-09 231124 76 0.632 0.06230 0.773 0.000000 0.0885 -4.856 0.0346 0.4740 130.033
100 Paper Rings Lover Taylor Swift 2019-08-23 222400 76 0.811 0.01290 0.719 0.000014 0.0742 -6.553 0.0497 0.8650 103.979
81 Don’t Blame Me reputation Taylor Swift 2017-11-10 236413 75 0.615 0.10600 0.534 0.000018 0.0607 -6.719 0.0386 0.1930 135.917
61 Style 1989 (Deluxe) Taylor Swift 2014-01-01 231000 75 0.588 0.00245 0.791 0.002580 0.1180 -5.595 0.0402 0.4870 94.933

提取 TOP 5 单曲

top_5 = populars.head()
name popularity
60 Blank Space 82
64 Shake It Off 80
95 Lover 80
82 Delicate 78
106 You Need To Calm Down 78


#有一栏danceability - 舞蹈性,我们要过滤掉那些舞蹈性最高的歌曲(即,超过0.75)。
dance_on = tay.loc[(tay.danceability >= 0.75)]
name album danceability
56 Treacherous - Original Demo Recording Red (Deluxe Edition) 0.828
59 Welcome To New York 1989 (Deluxe) 0.789
60 Blank Space 1989 (Deluxe) 0.760
68 How You Get The Girl 1989 (Deluxe) 0.765
71 Clean 1989 (Deluxe) 0.815
76 I Wish You Would - Voice Memo 1989 (Deluxe) 0.781
82 Delicate reputation 0.750
83 Look What You Made Me Do reputation 0.766
85 Gorgeous reputation 0.800
96 The Man Lover 0.777
98 I Think He Knows Lover 0.897
100 Paper Rings Lover 0.811
101 Cornelia Street Lover 0.824


energy = tay.loc[(tay.energy >= 0.5)]
energy = tay.sort_values(by='energy', ascending=False)
name album energy
26 Haunted Speak Now (Deluxe Package) 0.944
11 I'm Only Me When I'm With You Taylor Swift 0.934
24 Better Than Revenge Speak Now (Deluxe Package) 0.917
152 Tell Me Why (Taylor’s Version) Fearless (Taylor's Version) 0.909
57 Red - Original Demo Recording Red (Deluxe Edition) 0.902
38 Red Red (Deluxe Edition) 0.896
65 I Wish You Would 1989 (Deluxe) 0.893
74 New Romantics 1989 (Deluxe) 0.889
1 Picture To Burn Taylor Swift 0.877
163 The Other Side Of The Door (Taylor’s Version) Fearless (Taylor's Version) 0.873
21 The Story Of Us Speak Now (Deluxe Package) 0.855
62 Out Of The Woods 1989 (Deluxe) 0.841
108 ME! (feat. Brendon Urie of Panic! At The Disco) Lover 0.830


比如,我喜欢《Speak Now》 😃

tay.loc[tay.album == 'Speak Now (Deluxe Package)']
name album artist release_date length popularity danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
15 Mine - POP Mix Speak Now (Deluxe Package) Taylor Swift 2010-01-01 230546 45 0.696 0.004610 0.768 0.000001 0.1010 -3.863 0.0308 0.692 121.050
16 Sparks Fly Speak Now (Deluxe Package) Taylor Swift 2010-01-01 260946 50 0.608 0.038700 0.785 0.000000 0.1580 -2.976 0.0311 0.376 114.985
17 Back To December Speak Now (Deluxe Package) Taylor Swift 2010-01-01 293040 50 0.517 0.020200 0.606 0.000000 0.3240 -5.797 0.0289 0.296 141.929
18 Speak Now Speak Now (Deluxe Package) Taylor Swift 2010-01-01 240773 49 0.708 0.101000 0.601 0.000000 0.0979 -3.750 0.0306 0.742 118.962
19 Dear John Speak Now (Deluxe Package) Taylor Swift 2010-01-01 403887 48 0.583 0.183000 0.468 0.000002 0.1110 -5.378 0.0278 0.126 119.375
20 Mean Speak Now (Deluxe Package) Taylor Swift 2010-01-01 237746 48 0.570 0.445000 0.747 0.000000 0.2190 -3.978 0.0426 0.808 164.004
21 The Story Of Us Speak Now (Deluxe Package) Taylor Swift 2010-01-01 265636 48 0.575 0.000315 0.855 0.001610 0.0419 -4.827 0.0467 0.840 139.920
22 Never Grow Up Speak Now (Deluxe Package) Taylor Swift 2010-01-01 290480 44 0.715 0.829000 0.308 0.000000 0.1600 -8.829 0.0305 0.547 124.899
23 Enchanted Speak Now (Deluxe Package) Taylor Swift 2010-01-01 352200 51 0.535 0.071600 0.618 0.000388 0.1690 -3.913 0.0273 0.228 81.975
24 Better Than Revenge Speak Now (Deluxe Package) Taylor Swift 2010-01-01 217173 49 0.519 0.016700 0.917 0.000021 0.3590 -3.185 0.0887 0.652 145.882
25 Innocent Speak Now (Deluxe Package) Taylor Swift 2010-01-01 302266 44 0.553 0.202000 0.604 0.000000 0.1250 -5.295 0.0258 0.186 133.989
26 Haunted Speak Now (Deluxe Package) Taylor Swift 2010-01-01 242093 47 0.434 0.076900 0.944 0.000000 0.1510 -2.641 0.0581 0.361 162.020
27 Last Kiss Speak Now (Deluxe Package) Taylor Swift 2010-01-01 367146 47 0.359 0.581000 0.329 0.000037 0.0979 -9.531 0.0293 0.208 84.358
28 Long Live Speak Now (Deluxe Package) Taylor Swift 2010-01-01 317960 47 0.412 0.042600 0.682 0.000075 0.1060 -4.319 0.0339 0.146 203.959
29 Ours Speak Now (Deluxe Package) Taylor Swift 2010-01-01 238173 55 0.610 0.505000 0.556 0.000000 0.0851 -7.369 0.0285 0.192 159.838
30 If This Was A Movie Speak Now (Deluxe Package) Taylor Swift 2010-01-01 234546 52 0.515 0.154000 0.724 0.000004 0.2690 -3.498 0.0267 0.257 147.788
31 Superman Speak Now (Deluxe Package) Taylor Swift 2010-01-01 276266 47 0.582 0.018700 0.817 0.000002 0.1010 -3.718 0.0337 0.547 131.983
32 Back To December - Acoustic Speak Now (Deluxe Package) Taylor Swift 2010-01-01 292533 59 0.541 0.731000 0.451 0.000000 0.1970 -6.522 0.0270 0.333 141.713
33 Haunted - Acoustic Version Speak Now (Deluxe Package) Taylor Swift 2010-01-01 217626 48 0.574 0.841000 0.462 0.000000 0.2800 -5.124 0.0252 0.314 80.858
34 Mine Speak Now (Deluxe Package) Taylor Swift 2010-01-01 230773 64 0.621 0.003270 0.780 0.000005 0.1840 -2.934 0.0297 0.672 121.038
35 Back To December Speak Now (Deluxe Package) Taylor Swift 2010-01-01 293040 43 0.525 0.113000 0.676 0.000000 0.2940 -4.684 0.0294 0.281 141.950
36 The Story Of Us Speak Now (Deluxe Package) Taylor Swift 2010-01-01 266480 59 0.546 0.004870 0.809 0.000372 0.0437 -3.621 0.0410 0.649 139.910



#"Taylor Swift","Fearless","Speak Now","Red","1989","Reputation","Lover","Folklore","Evermore"
album = ["Taylor Swift","Fearless","Speak Now","Red","1989","Reputation","Lover","Folklore","Evermore"]
maxpopularity = [59,76,64,72,82,78,80,65,72]
newcolors = ['lightblue','gold','purple','red','tan','black','pink','grey','brown']
plt.figure( figsize = (8,6))
plt.bar(album, maxpopularity,color = newcolors)
plt.title(' Popularity of albums')



sns.displot(x='popularity', data=tay, kde=True, color='#a70ad5')
plt.title('Popularity Distribution')
Text(0.5, 1.0, 'Popularity Distribution')



import matplotlib.pyplot as plt
import numpy as np
energyy = tay.energy
dancee = tay.danceability
correlation_length = energyy.corr(dancee)
ax1 = tay.plot.scatter(x = 'energy',y = 'danceability',c = 'red')


Energy 和 Acousticness的相关性

plt.figure(figsize=(18, 6))
sns.regplot(x='acousticness', y='energy', data=tay)
plt.title('Acousticness vs Energy')
Text(0.5, 1.0, 'Acousticness vs Energy')


Energy 和 Loudness相关性

plt.figure(figsize=(18, 6))
sns.regplot(x='energy', y='loudness', data=tay)
plt.title('Energy vs Loudness')
Text(0.5, 1.0, 'Energy vs Loudness')



C:\Users\dogfa\AppData\Local\Temp\ipykernel_10964\3121711199.py:1: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
length popularity danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
length 1.000000 0.011772 -0.301561 0.038749 -0.114792 -0.081288 -0.148412 0.044126 -0.414447 -0.420405 0.010425
popularity 0.011772 1.000000 0.072622 -0.117842 0.127495 0.035638 -0.406730 0.122576 -0.478262 0.034154 -0.015669
danceability -0.301561 0.072622 1.000000 -0.143085 0.062669 -0.051837 -0.015766 0.002587 0.183860 0.379786 -0.235370
acousticness 0.038749 -0.117842 -0.143085 1.000000 -0.710055 0.140655 -0.065387 -0.736624 0.143127 -0.231232 -0.134467
energy -0.114792 0.127495 0.062669 -0.710055 1.000000 0.000281 0.046364 0.784973 -0.179336 0.490371 0.209914
instrumentalness -0.081288 0.035638 -0.051837 0.140655 0.000281 1.000000 -0.059132 -0.084224 -0.029729 0.020076 0.043274
liveness -0.148412 -0.406730 -0.015766 -0.065387 0.046364 -0.059132 1.000000 0.016324 0.357924 -0.017264 0.034934
loudness 0.044126 0.122576 0.002587 -0.736624 0.784973 -0.084224 0.016324 1.000000 -0.409577 0.299926 0.171503
speechiness -0.414447 -0.478262 0.183860 0.143127 -0.179336 -0.029729 0.357924 -0.409577 1.000000 0.120352 -0.027812
valence -0.420405 0.034154 0.379786 -0.231232 0.490371 0.020076 -0.017264 0.299926 0.120352 1.000000 -0.006056
tempo 0.010425 -0.015669 -0.235370 -0.134467 0.209914 0.043274 0.034934 0.171503 -0.027812 -0.006056 1.000000


import seaborn as sns
import matplotlib.pyplot as plt
C:\Users\dogfa\AppData\Local\Temp\ipykernel_10964\54934055.py:1: FutureWarning: The default value of numeric_only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns or specify the value of numeric_only to silence this warning.
<Axes: >



<seaborn.axisgrid.PairGrid at 0x29d8b437b50>


Random Forest Regression


随机森林中的每棵树在称为自助聚集 (bagging) 的过程中随机对训练数据子集进行抽样。该模型适合这些较小的数据集,并汇总预测结果。通过有放回抽样,可以重复使用同一数据的几个实例,结果就是,这些树不仅基于不同的数据集进行训练,而且还使用不同的特性做出决策。



# 删除album列
dataset = tay.drop(['album','artist','release_date','name'], axis=1)
length popularity danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
0 232106 49 0.580 0.575 0.491 0.000000 0.1210 -6.462 0.0251 0.425 76.009
1 173066 54 0.658 0.173 0.877 0.000000 0.0962 -2.098 0.0323 0.821 105.586
2 203040 59 0.621 0.288 0.417 0.000000 0.1190 -6.941 0.0231 0.289 99.953
3 199200 49 0.576 0.051 0.777 0.000000 0.3200 -2.881 0.0324 0.428 115.028
4 239013 50 0.418 0.217 0.482 0.000000 0.1230 -5.769 0.0266 0.261 175.558
... ... ... ... ... ... ... ... ... ... ... ...
166 277591 74 0.660 0.162 0.817 0.000000 0.0667 -6.269 0.0521 0.714 135.942
167 244236 65 0.609 0.849 0.373 0.000000 0.0779 -8.819 0.0263 0.130 106.007
168 189495 67 0.588 0.225 0.608 0.000000 0.0920 -7.062 0.0365 0.508 90.201
169 208608 66 0.563 0.514 0.473 0.000012 0.1090 -11.548 0.0503 0.405 101.934
170 242157 64 0.624 0.334 0.624 0.000000 0.0995 -7.860 0.0539 0.527 80.132

171 rows × 11 columns

x = dataset.drop(['popularity'],axis=1)
length danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
0 232106 0.580 0.575 0.491 0.000000 0.1210 -6.462 0.0251 0.425 76.009
1 173066 0.658 0.173 0.877 0.000000 0.0962 -2.098 0.0323 0.821 105.586
2 203040 0.621 0.288 0.417 0.000000 0.1190 -6.941 0.0231 0.289 99.953
3 199200 0.576 0.051 0.777 0.000000 0.3200 -2.881 0.0324 0.428 115.028
4 239013 0.418 0.217 0.482 0.000000 0.1230 -5.769 0.0266 0.261 175.558
... ... ... ... ... ... ... ... ... ... ...
166 277591 0.660 0.162 0.817 0.000000 0.0667 -6.269 0.0521 0.714 135.942
167 244236 0.609 0.849 0.373 0.000000 0.0779 -8.819 0.0263 0.130 106.007
168 189495 0.588 0.225 0.608 0.000000 0.0920 -7.062 0.0365 0.508 90.201
169 208608 0.563 0.514 0.473 0.000012 0.1090 -11.548 0.0503 0.405 101.934
170 242157 0.624 0.334 0.624 0.000000 0.0995 -7.860 0.0539 0.527 80.132

171 rows × 10 columns

y = dataset['popularity']
0 49
1 54
2 59
3 49
4 50
166 74
167 65
168 67
169 66
170 64
Name: popularity, Length: 171, dtype: int64

划分train test

from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(x,y, test_size=0.2,random_state=1)
length danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
89 230373 0.719 0.032900 0.469 0.000000 0.1690 -8.792 0.0533 0.0851 120.085
88 211506 0.624 0.060400 0.691 0.000011 0.1380 -6.686 0.1960 0.2840 160.024
165 220839 0.599 0.816000 0.494 0.000000 0.1010 -7.610 0.0372 0.4400 142.893
110 293453 0.557 0.808000 0.496 0.000173 0.0772 -9.602 0.0563 0.2650 149.983
48 284866 0.624 0.632000 0.340 0.033700 0.0805 -12.411 0.0290 0.2610 129.987
... ... ... ... ... ... ... ... ... ... ...
133 215626 0.546 0.418000 0.613 0.000000 0.1030 -7.589 0.0264 0.5350 79.015
137 260440 0.515 0.855000 0.545 0.000020 0.0921 -9.277 0.0353 0.5350 88.856
72 245560 0.422 0.049300 0.692 0.000026 0.1770 -5.447 0.0549 0.1970 184.014
140 257773 0.535 0.876000 0.561 0.000136 0.1150 -11.609 0.0484 0.2870 96.103
37 295720 0.588 0.000197 0.825 0.001380 0.0884 -5.882 0.0328 0.3970 129.968

136 rows × 10 columns

length danceability acousticness energy instrumentalness liveness loudness speechiness valence tempo
92 235466 0.661 0.921000 0.151 0.000000 0.1300 -12.864 0.0354 0.230 94.922
113 231000 0.688 0.481000 0.653 0.004140 0.1060 -8.558 0.0403 0.701 147.991
19 403887 0.583 0.183000 0.468 0.000002 0.1110 -5.378 0.0278 0.126 119.375
69 250093 0.481 0.678000 0.435 0.000000 0.0928 -8.795 0.0321 0.107 143.950
53 286613 0.619 0.187000 0.506 0.000015 0.1010 -7.327 0.0315 0.274 126.030
161 237338 0.476 0.040600 0.564 0.000000 0.1020 -5.677 0.0269 0.167 143.929
108 193000 0.610 0.033000 0.830 0.000000 0.1180 -4.105 0.0571 0.728 182.162
14 179066 0.459 0.040200 0.753 0.000000 0.0863 -3.827 0.0537 0.483 199.997
99 234146 0.662 0.028000 0.747 0.006150 0.1380 -6.926 0.0736 0.487 150.088
107 223293 0.756 0.130000 0.449 0.000000 0.1140 -8.746 0.0344 0.399 111.011
11 213053 0.563 0.004520 0.934 0.000807 0.1030 -3.629 0.0646 0.518 143.964
4 239013 0.418 0.217000 0.482 0.000000 0.1230 -5.769 0.0266 0.261 175.558
117 208906 0.602 0.888000 0.494 0.000026 0.0902 -10.813 0.0277 0.374 94.955
42 232120 0.661 0.002150 0.729 0.001300 0.0477 -6.561 0.0376 0.668 103.987
122 237266 0.593 0.670000 0.700 0.000007 0.1160 -9.016 0.0492 0.451 141.898
125 234000 0.644 0.916000 0.284 0.000015 0.0909 -12.879 0.0821 0.328 150.072
147 235766 0.627 0.130000 0.792 0.000004 0.0845 -4.311 0.0310 0.415 119.054
35 293040 0.525 0.113000 0.676 0.000000 0.2940 -4.684 0.0294 0.281 141.950
81 236413 0.615 0.106000 0.534 0.000018 0.0607 -6.719 0.0386 0.193 135.917
31 276266 0.582 0.018700 0.817 0.000002 0.1010 -3.718 0.0337 0.547 131.983
51 220600 0.649 0.021300 0.777 0.000335 0.2100 -5.804 0.0406 0.587 126.018
75 216333 0.592 0.829000 0.128 0.000000 0.5270 -17.932 0.5890 0.150 78.828
78 208186 0.613 0.052700 0.764 0.000000 0.1970 -6.509 0.1360 0.417 160.015
73 267106 0.474 0.707000 0.480 0.000108 0.0903 -8.894 0.0622 0.319 170.109
40 219720 0.622 0.004540 0.469 0.000002 0.0335 -6.798 0.0363 0.679 77.019
84 227906 0.574 0.122000 0.610 0.000001 0.1300 -7.283 0.0732 0.374 74.957
47 202960 0.627 0.016200 0.816 0.002080 0.0965 -6.698 0.0774 0.648 157.043
29 238173 0.610 0.505000 0.556 0.000000 0.0851 -7.369 0.0285 0.192 159.838
16 260946 0.608 0.038700 0.785 0.000000 0.1580 -2.976 0.0311 0.376 114.985
105 200306 0.739 0.736000 0.320 0.000147 0.1110 -10.862 0.2390 0.351 79.970
85 209680 0.800 0.071300 0.535 0.000009 0.2130 -6.684 0.1350 0.451 92.027
154 243136 0.402 0.003300 0.732 0.000000 0.1080 -4.665 0.0484 0.472 161.032
157 279359 0.499 0.000191 0.815 0.000000 0.1810 -4.063 0.0341 0.344 95.999
5 207106 0.589 0.004910 0.805 0.000000 0.2400 -4.055 0.0293 0.591 112.982
94 178426 0.552 0.117000 0.702 0.000021 0.1050 -5.707 0.1570 0.564 169.994
89 71
88 68
165 63
110 68
48 59
133 65
137 65
72 65
140 63
37 62
Name: popularity, Length: 136, dtype: int64
92 67
113 63
19 48
69 60
53 58
161 61
108 77
14 48
99 70
107 74
11 50
4 50
117 62
42 64
122 61
125 60
147 74
35 43
81 75
31 47
51 58
75 0
78 74
73 62
40 65
84 66
47 61
29 55
16 50
105 68
85 72
154 74
157 61
5 47
94 77
Name: popularity, dtype: int64


from sklearn.ensemble import RandomForestRegressor
regressor = RandomForestRegressor(n_estimators=100, random_state=27)
regressor.fit(x_train, y_train)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.




compare = pd.DataFrame(y_test)
compare.columns = ['actual']
92 67
113 63
19 48
69 60
53 58
161 61
108 77
14 48
99 70
107 74
11 50
4 50
117 62
42 64
122 61
125 60
147 74
35 43
81 75
31 47
51 58
75 0
78 74
73 62
40 65
84 66
47 61
29 55
16 50
105 68
85 72
154 74
157 61
5 47
94 77
y_pred = regressor.predict(x_test)
compare['predict'] = y_pred.round(0)
actual predict
92 67 60.0
113 63 66.0
19 48 55.0
69 60 63.0
53 58 61.0
161 61 53.0
108 77 59.0
14 48 60.0
99 70 66.0
107 74 61.0
11 50 55.0
4 50 54.0
117 62 63.0
42 64 68.0
122 61 65.0
125 60 68.0
147 74 57.0
35 43 55.0
81 75 68.0
31 47 56.0
51 58 64.0
75 0 43.0
78 74 69.0
73 62 67.0
40 65 65.0
84 66 68.0
47 61 67.0
29 55 62.0
16 50 57.0
105 68 68.0
85 72 72.0
154 74 60.0
157 61 58.0
5 47 56.0
94 77 67.0
posted @   dogfaraway  阅读(59)  评论(0编辑  收藏  举报
· 震惊!C++程序真的从main开始吗?99%的程序员都答错了
· winform 绘制太阳,地球,月球 运作规律
· 【硬核科普】Trae如何「偷看」你的代码?零基础破解AI编程运行原理
· 上周热点回顾(3.3-3.9)
· 超详细:普通电脑也行Windows部署deepseek R1训练数据并当服务器共享给他人