机器学习 - 随笔分类 - 昕友软件开发

总目录索引（开发精华总结）

摘要：整理出近几年的随笔笔记分类。 #Java多线程开发系列 Java多线程开发系列-基础 Java多线程开发系列-线程间协作 Java多线程开发系列-线程安全设计 Java多线程开发系列-线程活性故障 Java多线程开发系列-线程管理 CompletableFuture组合异步编程 Swing中的线程并阅读全文

posted @ 2020-04-08 15:59 昕友软件开发阅读(508) 评论(0) 推荐(0)

「二分类算法」提供银行精准营销解决方案代码存档

摘要：第一次提交，没做什么特征工程，分数还不太理想 0.9157894736842105Accuracy : 0.9158AUC Score (Test): 0.932477 过程分析 from numpy import int64 from sklearn import metrics from skl 阅读全文

posted @ 2019-11-06 17:32 昕友软件开发阅读(1080) 评论(0) 推荐(0)

机器学习项目清单

摘要：总共有八个步骤： 1 规范化问题：Frame the Problem and Look at the Big Picture 2 获取数据：Get the Data 注意：尽可能自动化获取数据，这样您可以轻松获取最新的数据 3 探索数据：Explore the Data 名字类型：分类，int / 阅读全文

posted @ 2019-10-24 16:00 昕友软件开发阅读(385) 评论(0) 推荐(0)

持久化机器学习模型（joblib方式）

摘要：import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatures from sklearn.externals import joblib X_train = 阅读全文

posted @ 2019-10-23 15:44 昕友软件开发阅读(2545) 评论(0) 推荐(0)

使用协方差矩阵的特征向量PCA来处理数据降维

摘要：取2维特征，方便图形展示阅读全文

posted @ 2019-10-23 11:37 昕友软件开发阅读(321) 评论(0) 推荐(0)

使用肘部法确定k-means均值的k值

摘要：X为：随着K的增加，纵轴呈下降趋势且最终趋于稳定，那么拐点肘部处的位置所对应的k 值，不妨认为是相对最佳的类聚数量值。阅读全文

posted @ 2019-10-23 11:07 昕友软件开发阅读(2643) 评论(0) 推荐(0)

使用GridSearchCV进行网格搜索微调模型

摘要：微调后： Best score: 0.983Best parameters set: clf__C: 10 clf__penalty: 'l2' vect__max_df: 0.5 vect__max_features: None vect__ngram_range: (1, 2) vect__st 阅读全文

posted @ 2019-10-22 11:52 昕友软件开发阅读(1079) 评论(0) 推荐(0)

分类的性能评估：准确率、精确率、Recall召回率、F1、F2

摘要：AdaBoost precision recall f1-score support 0 0.83 0.85 0.84 634 1 0.84 0.82 0.83 616 accuracy 0.83 1250 macro avg 0.83 0.83 0.83 1250weighted avg 0.83 阅读全文

posted @ 2019-10-22 11:16 昕友软件开发阅读(2015) 评论(0) 推荐(0)

二分类下的混淆矩阵

摘要：from sklearn.metrics import confusion_matrix import matplotlib.pyplot as plt y_test = [0, 0, 0, 0, 0, 1, 1, 1, 1, 1] y_pred = [0, 1, 0, 0, 0, 0, 0, 1, 阅读全文

posted @ 2019-10-22 11:07 昕友软件开发阅读(1686) 评论(0) 推荐(0)

多项式的回归

摘要：import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LinearRegression from sklearn.preprocessing import PolynomialFeatu 阅读全文

posted @ 2019-10-21 17:14 昕友软件开发阅读(332) 评论(0) 推荐(0)

使用变换来提升单回归准确度的一个反例

摘要：结果是： Actual weights: [66, 87, 68, 74]Predicted weights: [62.4 76.8 66. 72.6]Predicted weights by StandardScaler: [69.4 76.8 59.2 59.2]Predicted weight 阅读全文

posted @ 2019-10-21 11:44 昕友软件开发阅读(241) 评论(0) 推荐(0)

使用对数变换来提升单变量的回归准确度

摘要：Ridge Test score: 0.622Test score: 0.875 用log变换一般是在连续值拉锯越来越大时使用。阅读全文

posted @ 2019-10-18 16:02 昕友软件开发阅读(405) 评论(0) 推荐(0)

聚类K-Means和大数据集的Mini Batch K-Means算法

摘要：过程解析：在大数据集的情况下还可以使用scikit-learn 提供了MiniBatchKMeans算法，大致思想就是对数据进行抽样，每次不使用所有的数据来计算，这就会导致准确率的损失。 MiniBatchKmeans 继承自Kmeans 因为MiniBathcKmeans 本质上还利用了Kmea 阅读全文

posted @ 2019-10-17 11:17 昕友软件开发阅读(2435) 评论(0) 推荐(0)

通过直方图进行PCA准备

摘要：import graphviz import mglearn from mpl_toolkits.mplot3d import Axes3D from sklearn.datasets import load_breast_cancer, make_blobs from sklearn.ensemb 阅读全文

posted @ 2019-10-16 17:17 昕友软件开发阅读(364) 评论(0) 推荐(0)

sklearn使用高斯核SVM显示支持向量

摘要：import graphviz import mglearn from mpl_toolkits.mplot3d import Axes3D from sklearn.datasets import load_breast_cancer, make_blobs from sklearn.ensemb 阅读全文

posted @ 2019-10-16 11:45 昕友软件开发阅读(779) 评论(0) 推荐(0)

决策树和随机森林分类

摘要：决策树：默认深度，因为深度过大，造成过拟合，训练精度是1Accuracy on training set: 1.000Accuracy on test set: 0.937 设置为4，tree = DecisionTreeClassifier(max_depth=4,random_state=0) 阅读全文

posted @ 2019-10-15 16:57 昕友软件开发阅读(709) 评论(0) 推荐(0)

线性回归曲线和过拟合判断

摘要：结果： w[0]: 0.393906 b: -0.031804 结果2： Training set score: 0.95Test set score: 0.61 可以看出出现了过拟合，这是因为波士顿房价的各个特征的差距非常大，不适合使用最小二乘法，需要使用“正则化”来做显式约束，使用岭回归避免过拟阅读全文

posted @ 2019-10-15 11:30 昕友软件开发阅读(1094) 评论(0) 推荐(0)

wave数据集的回归曲线

摘要：wave数据集的回归曲线输出： matplotlib.pyplot.plot()参数详解：绘制线条或标记的轴。参数是一个可变长度参数，允许多个X、Y对可选的格式字符串。例如，下面的每一个都是合法的： plot(x, y) #plot x, y使用默认的线条样式和颜色 plot(x, y, 'b 阅读全文

posted @ 2019-10-15 09:37 昕友软件开发阅读(706) 评论(0) 推荐(0)

用KNN实现iris的4分类问题&测试精度

摘要：输出： Prediction X_new:[0]prediction X_new belong to ['setosa']test score1:0.97test score2:0.97 测试精度 knn的邻居设置会影响测试精度，举例说明：可以看出，6是最优。 KNN算法的优点是简单可解释性强，阅读全文

posted @ 2019-10-14 17:25 昕友软件开发阅读(682) 评论(0) 推荐(0)

pandas绘制矩阵散点图（scatter_matrix）的方法

摘要：以 sklearn的iris样本为数据集阅读全文

posted @ 2019-10-14 16:46 昕友软件开发阅读(5954) 评论(0) 推荐(1)

昕友软件开发

知行合一

随笔分类 - 机器学习