日常杂谈 - 20191005
1.函数传递过程中,参数前的单星号代表任意数量的参数,双星号代表dict与参数之间的转换;形参带星号代表将多余的实参整合到该形参里,实参带星号代表将该参数分解传递
2.LabelEcoder:将参数编码为[0, n-1]范围的数字
from sklearn.preprocessing import LabelEncoder le = LabelEncoder() le.fit([1,5,67,100]) le.transform([1,1,100,67,5])
3.
one_value_cols = [col for col in train.columns if train[col].nunique() <= 1]
4.折线图
x = [1, 2, 1, 2, 3] y = [1, 2, 1, 3] plt.plot(range(len(x)), x, c = 'r') plt.plot(range(len(y)), y, c = 'b') plt.show()
5.概率分布图
x = [1, 2, 1, 2, 3]
plt.hist(x)
plt.show()
6.对df某列计数
df[col].value_counts(dropna=False, normalize=True)
返回series
20191006
1.特征工程利用mean,std创建新的特征,应该在进行k折交叉验证之后进行,否则会导致信息泄露
2.xgboost gpu安装需要python版本>=3.5
3.参数调优大杀器:https://www.jianshu.com/p/35eed1567463
best = fmin(fn=objective, # function space=space, # dict, params algo=tpe.suggest, max_evals=27) # max work
best_params = space_eval(space, best)
4.参数后面有多个括号,意为函数套函数
def test5(x): print 'test5_param = ', x def test6(x): print 'test6_param = ', x return x * x return test6 print test5(1)(2)
output:
test5_param = 1 test6_param = 2 4
5.
sklearn.metrics.roc_auc_score(y_true, y_score, average='macro', sample_weight=None) 返回roc_auc分数
6.str.format()
>>> print("{:.2f}".format(3.1415926)); 3.14 site = {"name": "菜鸟教程", "url": "www.runoob.com"} print "网站名:{name}, 地址 {url}".format(**site)
ieee大佬的notebook:https://www.kaggle.com/kabure/extensive-eda-and-modeling-xgb-hyperopt/notebook
未完待续