PEAR2020

2021年1月4日

摘要： <Java> 1、pom <dependency> <groupId>com.datastax.cassandra</groupId> <artifactId>cassandra-driver-core</artifactId> <version>3.1.1</version> </dependen 阅读全文

posted @ 2021-01-04 15:34 PEAR2020 阅读(160) 评论(4) 推荐(0) 编辑

2020年12月31日

rpm安装在哪里了？

摘要： rpm -qa|grep cassandra >>>cassandra-3.11.9-1.noarch rpm -ql cassandra-3.11.9-1.noarch 阅读全文

posted @ 2020-12-31 16:13 PEAR2020 阅读(394) 评论(0) 推荐(0) 编辑

Cassandra （一）

摘要：关于联合索引是否能起作用？使用where一定要带上主索引a！！！！否则不生效！！！！ create keyspace patient with replication = {'class':'SimpleStrategy','replication_factor: 1'} create table 阅读全文

posted @ 2020-12-31 13:38 PEAR2020 阅读(74) 评论(0) 推荐(0) 编辑

Git 之 gitlab搭建私服+git上传到私服中

摘要：快速git配置和git clone git config --global user.name "wenyan" git config --global user.email "sabertobihwy@gmail.com" git config --global --list ssh-keygen 阅读全文

posted @ 2020-12-31 08:44 PEAR2020 阅读(284) 评论(0) 推荐(0) 编辑

2020年12月27日

spark 之 UDF的两种方式

摘要：详见：https://www.cnblogs.com/itboys/p/9347403.html 1）如果使用spark.sql("") => 内部调用hive处理，只能使用spark.udf.register("",) 例如： import org.apache.spark.sql.functio 阅读全文

posted @ 2020-12-27 21:25 PEAR2020 阅读(1022) 评论(0) 推荐(0) 编辑

2020年12月24日

hive到hive数据迁移

摘要：步骤按照：https://www.it610.com/article/1292557527262765056.htm 在原hive中： 1）如果文件小： export table dm_events.dm_usereventfinal to '/tmp/hive-export/dm' 2）如果文件大阅读全文

posted @ 2020-12-24 17:43 PEAR2020 阅读(594) 评论(0) 推荐(0) 编辑

离线数据分析之人物兴趣取向分析（2-3）使用pyspark构建Kmeans/随机森林完成分类预测

摘要：一、下载包 settings -> interpreter -> + joblib 存取模型 + matplotlib + numpy + pyspark + scikit-learn 二、先确定pyCharm能用spark.sql连接hive成功见 https://www.cnblogs.co 阅读全文

posted @ 2020-12-24 14:38 PEAR2020 阅读(537) 评论(0) 推荐(0) 编辑

python 之sparkSQL连接hive

摘要：可参考 https://blog.csdn.net/m0_46651978/article/details/111618085#comments_14329527 一、首先，linux上单节点方法 1. 先把spark stop了：sbin/stop-all.sh2. 把hive里面的hive-s 阅读全文

posted @ 2020-12-24 12:38 PEAR2020 阅读(1146) 评论(0) 推荐(0) 编辑

2020年12月22日

hive面试题之统计最近七天内连续登陆3天的用户数量

摘要：原始数据： val df = Seq( ("2020-09-21",1), ("2020-09-20",1), ("2020-09-19",1), ("2020-09-17",1), ("2020-09-16",1), ("2020-09-15",1), ("2020-09-20",2), ("20 阅读全文

posted @ 2020-12-22 09:37 PEAR2020 阅读(1578) 评论(0) 推荐(0) 编辑

2020年12月21日

离线数据分析之人物兴趣取向分析（2-2）离线/实时项目架构|项目流程|数仓构建（进阶篇）

摘要：一、离线 vs 实时流框架用spark数据清洗的过程见：日志分析 https://www.cnblogs.com/sabertobih/p/14070357.html 实时流和离线的区别在于数据处理之间的时间差，而不取决于工具。所以kafka，sparkstreaming亦可用于离线批处理。离线阅读全文

posted @ 2020-12-21 22:09 PEAR2020 阅读(579) 评论(0) 推荐(1) 编辑

公告