2021 年 3月 9 日随笔档案 - 大数据程序员

2021年3月9日

摘要： KMeans聚类基于python有两种实现方式，一种是手动写算法实现聚类，另一种是采用写好的算法自动实现聚类，下面针对两种方法进行代码实现一、数据准备文件 testSet.txt 数据如下： 1.658985 4.285136 -3.453687 3.424321 4.838138 -1.151 阅读全文

posted @ 2021-03-09 21:29 大数据程序员阅读(1086) 评论(0) 推荐(0)

KNN（K近邻分类）

摘要：一、数据准备：文件 datingTestSet.txt 数据如下： 40920 8.326976 0.953952 3 14488 7.153469 1.673904 2 26052 1.441871 0.805124 1 75136 13.147394 0.428964 1 38344 1.669 阅读全文

posted @ 2021-03-09 21:28 大数据程序员阅读(327) 评论(0) 推荐(0)

py计算wordcount

摘要：一、数据准备：文件 words数据如下： hello spark hello python hello scala hello spark hello python 二、python代码实现如下： 1 from pyspark import SparkConf, SparkContext 2 3 i 阅读全文

posted @ 2021-03-09 21:27 大数据程序员阅读(79) 评论(0) 推荐(0)

线性回归

摘要：一、数据准备：文件 lpsa.data 数据如下： 1 -0.4307829,-1.63735562648104 -2.00621178480549 -1.86242597251066 -1.02470580167082 -0.522940888712441 -0.863171185425945 - 阅读全文

posted @ 2021-03-09 21:22 大数据程序员阅读(246) 评论(0) 推荐(0)

贝叶斯原理+垃圾邮件分类

摘要：使用贝叶斯算法原理进行垃圾邮件预测一、数据准备：文件sms_spam.txt内容如下： type,text ham,you are having a good week. Just checking in 00 00 00 0089 0089 00890089 0089 0089 0089 008 阅读全文

posted @ 2021-03-09 20:36 大数据程序员阅读(3425) 评论(0) 推荐(0)

摘要： 1 阅读全文

posted @ 2021-03-09 20:31 大数据程序员阅读(53) 评论(0) 推荐(0)

no python interpreter configured for the project

摘要：转载于https://www.cnblogs.com/qy1234/p/8520691.html 阅读全文

posted @ 2021-03-09 20:19 大数据程序员阅读(166) 评论(0) 推荐(0)

开发PySpark 所需准备环境

摘要： 1) 安装python环境安装python有两种方式：原生安装、Ancona安装以上安装路径中不能有中文，不能有空格 2) window中必须配置SPARK_HOME 3) 在python中安装py4j模块两种安装方式：建议使用第二（2）种（1）使用 pip install py4j 进入阅读全文

posted @ 2021-03-09 10:27 大数据程序员阅读(502) 评论(0) 推荐(0)

关于pip升级问题You are using pip version 10.0.1, however version 21.0.1 is available

摘要：使用pip安装py4j的方式：进入到安装Anaconda的Script目录下：cmd，然后输入 pip install py4j 进行安装安装时出现问题：You are using pip version 10.0.1, however version 21.0.1 is available 完美阅读全文

posted @ 2021-03-09 09:54 大数据程序员阅读(2628) 评论(0) 推荐(0)

大数据程序员

公告