muyue123

2021年1月6日

摘要： spark.sql( "select /*+ mapjoin(b) */ " "* from t1 a left join t2 b on a.id=b.id" ).explain()1：当表有别名时，需要hint的是表的别名；2：hint的关键字（例如本例中的:mapjoin），大小写都可以识别。阅读全文

posted @ 2021-01-06 11:31 muyue123 阅读(130) 评论(0) 推荐(0) 编辑

2020年12月9日

trancare——python操作

摘要：下载安装包 sudo pip-3.6 install sqlalchemy 阅读全文

posted @ 2020-12-09 17:31 muyue123 阅读(75) 评论(0) 推荐(0) 编辑

2020年12月7日

实现分析函数

摘要： SELECT region, country, category, max(multiIf(rownumber = 1, app, '')) AS col1, max(multiIf(rownumber = 2, app, '')) AS col2, max(multiIf(rownumber = 阅读全文

posted @ 2020-12-07 19:14 muyue123 阅读(85) 评论(0) 推荐(0) 编辑

2020年11月27日

spark读写mysql

摘要： * 获取 Mysql 表的数据** @param sqlContext* @param tableName 读取Mysql表的名字* @param proPath 配置文件的路径* @return 返回 Mysql 表的 DataFrame*/def readMysqlTable(sqlContex 阅读全文

posted @ 2020-11-27 11:30 muyue123 阅读(161) 评论(0) 推荐(0) 编辑

2020年11月25日

spark-submit参数详解

摘要：通用可选参数： --master MASTER_URL, 可以是 spark://host:port, mesos://host:port, yarn, yarn-cluster,yarn-client, local --deploy-mode DEPLOY_MODE, Driver 程序运行的阅读全文

posted @ 2020-11-25 15:58 muyue123 阅读(5730) 评论(0) 推荐(0) 编辑

迭代和生成器

摘要： '''列表迭代的本质:__iter()__函数返回一个迭代器，然后可以调用迭代器上的next方法''' arr = [1,2,3,4,5] arr_iterator = arr.__iter__() print(arr_iterator.__next__()) print(next(arr_iter 阅读全文

posted @ 2020-11-25 11:09 muyue123 阅读(110) 评论(0) 推荐(0) 编辑

2020年11月18日

正则表达式

摘要：替换“-”和多个空格 select regexp_replace('a-b cd','-|\\s+','') 阅读全文

posted @ 2020-11-18 16:38 muyue123 阅读(46) 评论(0) 推荐(0) 编辑

查看yarn的log

摘要： yarn application -listyarn logs -applicationId application_1493700892407_0007 阅读全文

posted @ 2020-11-18 14:00 muyue123 阅读(970) 评论(0) 推荐(0) 编辑

2020年11月16日

filter中使用udf以及in操作的例子

摘要： df = spark.sql( "select 100 as c1,Array(struct(1,2)) as a" " union all" " select 50 as c1,Array(struct(3,4)) as a" ) def test_udf(c): return c+1 spark 阅读全文

posted @ 2020-11-16 20:08 muyue123 阅读(446) 评论(0) 推荐(0) 编辑

2020年10月26日

复杂数据类型

摘要： inline函数： SELECT inline(array(struct(1, 'a'), struct(2, 'b'))) 1 a2 b 阅读全文

posted @ 2020-10-26 10:31 muyue123 阅读(125) 评论(0) 推荐(0) 编辑

公告