随笔档案「2018年8月」 - 静悟生慧

数据导入时出现的问题：

摘要：## 在数据平台导入数据：1、新建表，注意分隔符的选取，按照txt中的格式确定是',' 还是 ‘\t’ ； 2、在将csv转为txt之后，去掉表头, 将csv转为txt时可以选择直接将后缀进行替换，不会出错；（尽量不要使用将csv另存为txt的方法，这样会出错） 3、选择导入的txt文件，支持utf 阅读全文

posted @ 2018-08-31 20:17 静悟生慧阅读(393) 评论(0) 推荐(0)

sklearn word2vec 实践

摘要：源代码： https://blog.csdn.net/github_38705794/article/details/75452729 一、复现时报错： Traceback (most recent call last): File "D:\Program\python3\lib\site-pack 阅读全文

posted @ 2018-08-31 20:07 静悟生慧阅读(634) 评论(0) 推荐(0)

word2vector 理解入门

摘要：1.什么是word2vector? 我们先来看一个问题，假如有一个句子 " the dog bark at the mailman"。假如用向量来表示每个单词，我们最先想到的是用one hot 编码的方式来表达每个单词，具体来说。 the 可以表示为 [1,0,0,0,0] dog 可以表示为 [ 阅读全文

posted @ 2018-08-31 17:22 静悟生慧阅读(4328) 评论(0) 推荐(0)

零样本学习 - （Zero shot learning，ZSL）

摘要：https://zhuanlan.zhihu.com/p/41846072 https://zhuanlan.zhihu.com/p/38418698 https://zhuanlan.zhihu.com/p/41854739 阅读全文

posted @ 2018-08-31 16:37 静悟生慧阅读(1188) 评论(0) 推荐(0)

pandas drop_duplicates

摘要：函数： DataFrame.drop_duplicates(subset=None, keep='first', inplace=False) 参数：这个drop_duplicate方法是对DataFrame格式的数据，去除特定列下面的重复行。返回DataFrame格式的数据。补充： Panda 阅读全文

posted @ 2018-08-30 11:10 静悟生慧阅读(4289) 评论(0) 推荐(0)

所谓 A/B test

摘要：A/B测试就是上两个方案，部署后看效果。根据效果和一些结果参数决定采用哪个方案。灰度发布是切一部分业务使用新方案，看效果如何，是否有bug，会遇到什么问题。如果一切OK，就把全部业务切到新的方案上执行。 A/B 测试，顾名思义就是方案A和方案B的比较。为同一个目标设计两个方案，一部分用户使用A方案，阅读全文

posted @ 2018-08-29 22:01 静悟生慧阅读(607) 评论(0) 推荐(0)

sklearn 中模型保存的两种方法

摘要：一、 sklearn中提供了高效的模型持久化模块joblib，将模型保存至硬盘。 from sklearn.externals import joblib #lr是一个LogisticRegression模型 joblib.dump(lr, 'lr.model') lr = joblib.load( 阅读全文

posted @ 2018-08-28 15:31 静悟生慧阅读(28911) 评论(1) 推荐(1)

Python sklearn 分类效果评估

摘要：https://blog.csdn.net/sinat_26917383/article/details/75199996 阅读全文

posted @ 2018-08-28 11:11 静悟生慧阅读(599) 评论(0) 推荐(0)

xgboost 自定义目标函数和评估函数

摘要：https://zhpmatrix.github.io/2017/06/29/custom-xgboost/ https://www.cnblogs.com/silence-gtx/p/5812012.html https://blog.csdn.net/hfzd24/article/details 阅读全文

posted @ 2018-08-27 15:14 静悟生慧阅读(3902) 评论(0) 推荐(0)

特征组合&特征交叉

摘要：https://segmentfault.com/a/1190000014799038 https://www.jianshu.com/p/fc96675b6f8e https://blog.csdn.net/gaoyueace/article/details/78689737 结合sklearn进阅读全文

posted @ 2018-08-23 16:50 静悟生慧阅读(1838) 评论(0) 推荐(0)

sklearn中xgboost模块中plot_importance函数（特征重要性）

摘要：# -*- coding: utf-8 -*- """ ############################################################################### # 作者：wanglei5205 # 邮箱：wanglei5205@126.com # 代码：http://github.com/wanglei5205 # 博客：http://cn... 阅读全文

posted @ 2018-08-22 20:49 静悟生慧阅读(7568) 评论(0) 推荐(1)

博客

摘要：博客推荐： https://hankin2015.github.io/2222/11/10/22221110DataProcess_HJ/ http://wepon.me/ 阅读全文

posted @ 2018-08-22 12:42 静悟生慧阅读(166) 评论(0) 推荐(0)

python list插入、拼接

摘要：1可以使用"+"号完成操作输出为： [1, 2, 3, 8, 'google', 'com'] 2.使用extend方法、输入相同 3使用切片输出相同 PS：len（l1）代表要将l2插入l1中的位置例如输出为：又如：输出为：总结：第一种方方法思路比较清晰，就是运算符的重载；第阅读全文

posted @ 2018-08-20 16:55 静悟生慧阅读(25385) 评论(0) 推荐(0)

windows和linux中换行符的转换

摘要：数据开发平台使用上传脚本报错：保存失败，文件编码格式不正确，请修改文件换行符为Unix终束符！修改方式:DOS系统下，使用文本编译器另存为，然后选择换行符为unix终束符。解释： windows 文件的换行符为：[CR][LF] Linux和Unix文件的换行符为：[LF] 有些需要转换使用，如阅读全文

posted @ 2018-08-17 12:05 静悟生慧阅读(4384) 评论(0) 推荐(0)

使用 scikit-learn 实现多类别及多标签分类算法

摘要：多标签分类格式对于多标签分类问题而言，一个样本可能同时属于多个类别。如一个新闻属于多个话题。这种情况下，因变量yy需要使用一个矩阵表达出来。而多类别分类指的是y的可能取值大于2，但是y所属类别是唯一的。它与多标签分类问题是有严格区别的。所有的scikit-learn分类器都是默认支持多类别分类的阅读全文

posted @ 2018-08-17 11:14 静悟生慧阅读(9073) 评论(2) 推荐(1)

python 特征缺失值填充

摘要：python数据预处理之缺失值简单处理：https://blog.csdn.net/Amy_mm/article/details/79799629 该博客总结比较详细，感谢博主。我们在进行模型训练时，不可避免的会遇到某些特征出现空值的情况，下面整理了几种填充空值的方法 1. 用固定值填充对于特征阅读全文

posted @ 2018-08-16 16:43 静悟生慧阅读(17275) 评论(0) 推荐(0)

多输出回归问题

摘要：Scikit-Learn also has a general class, MultiOutputRegressor, which can be used to use a single-output regression model and fit one regressor separatel 阅读全文

posted @ 2018-08-16 10:34 静悟生慧阅读(8121) 评论(0) 推荐(0)

python DataFrame获取行数、列数、索引及第几行第几列的值

摘要：df=DataFrame([{‘A’:’11’,’B’:’12’},{‘A’:’111’,’B’:’121’},{‘A’:’1111’,’B’:’1211’}]) 来自：https://blog.csdn.net/u012189747/article/details/78203364?locatio 阅读全文

posted @ 2018-08-15 22:09 静悟生慧阅读(61714) 评论(0) 推荐(1)

Xgboost 模型保存和载入（）

摘要：https://blog.csdn.net/u012884015/article/details/78653178 阅读全文

posted @ 2018-08-14 20:48 静悟生慧阅读(13893) 评论(0) 推荐(0)

认真看看

摘要：https://blog.csdn.net/sinat_32502811/article/details/80878146 里面有weapon 大神的git地址 https://blog.csdn.net/francis1019/article/details/81253401 有作者源码：http 阅读全文

posted @ 2018-08-14 18:30 静悟生慧

pandas所占内存释放

摘要：df = pd.read_csv('....') 要调用循环处理多个文件时，内存占用情况严重，如果互相之间不需要调用，可以直接del df 释放内存阅读全文

posted @ 2018-08-14 11:33 静悟生慧阅读(3388) 评论(0) 推荐(1)

SSE,MSE,RMSE,R-square 指标讲解

摘要：SSE(和方差、误差平方和)：The sum of squares due to error MSE(均方差、方差)：Mean squared errorRMSE(均方根、标准差)：Root mean squared errorR-square(确定系数)：Coefficient of determ 阅读全文

posted @ 2018-08-13 11:51 静悟生慧阅读(6267) 评论(0) 推荐(0)

sklearn基本回归方法

摘要：https://blog.csdn.net/u010900574/article/details/52666291 博主总结和很好，方法很实用。 python一些依赖库： https://www.lfd.uci.edu/~gohlke/pythonlibs/ lightgbm的原理及使用简介：包含建阅读全文

posted @ 2018-08-12 22:16 静悟生慧阅读(1047) 评论(0) 推荐(0)

iloc[[i]] 和 loc[[i]] 的区别

摘要：In [2]: df Out[2]: A B 0 1.068932 -0.794307 2 -0.470056 1.192211 4 -0.284561 0.756029 6 1.037563 -0.267820 8 -0.538478 -0.800654 In [5]: df.iloc[[2]] 阅读全文

posted @ 2018-08-12 20:09 静悟生慧阅读(463) 评论(0) 推荐(0)

问题解决： Pandas and scikit-learn: KeyError: […] not in index

摘要：https://stackoverflow.com/questions/51091132/pandas-and-scikit-learn-keyerror-not-in-index The problem is the way you are trying to index the X using 阅读全文

posted @ 2018-08-10 11:23 静悟生慧阅读(12083) 评论(0) 推荐(0)

pandas DataFrame 数据处理常用操作

摘要：Xgboost调参： https://wuhuhu800.github.io/2018/02/28/XGboost_param_share/ https://blog.csdn.net/hx2017/article/details/78064362 pandas DataFrame中的空值处理： h 阅读全文

posted @ 2018-08-10 10:43 静悟生慧阅读(493) 评论(0) 推荐(0)

host文件配置了解

摘要：https://blog.csdn.net/CJF_iceKing/article/details/7702694 hosts文件位于" C:\Windows\System32\drivers\etc "目录下，用于转换名字与IP地址的转换。在浏览器中通过域名访问网站，首先查看hosts文件中是否阅读全文

posted @ 2018-08-03 16:45 静悟生慧阅读(6048) 评论(0) 推荐(0)

安装Spring+搭建Spring开发环境

摘要：https://blog.csdn.net/csdnsjg/article/details/80152815 https://jingyan.baidu.com/article/219f4bf798e0cfde442d3831.html 类似spring 、 maven 这类插件，需要再eclips 阅读全文

posted @ 2018-08-01 21:27 静悟生慧阅读(579) 评论(0) 推荐(0)

静悟生慧

08 2018 档案

公告