随笔分类 - Hive
摘要:select day ,case when month(day) != month(date_add(day,7)) then 'y' else 'n' end as is_last_week -- 是否月份最后一周 from ( select '2021-06-06' as day union a
阅读全文
摘要:命令格式 WITH cte_name AS ( cte_query ) [,cte_name2 AS ( cte_query2 ) ,……]
阅读全文
摘要:这个方案是之前一个前同事问我时想出来的,但当时没进行实践,只是觉得可行。 今天终于也遇到这个场景,所以就进行测试。 两个库进行测试,还有其它的方案,比如spark,python,落地文件再对比差异等,各有优劣,比如spark需要集群的支持,并且速度也不是很快。ptyhon对大数据量的支持不是很好,落
阅读全文
摘要:with data as ( select t1.* ,row_number() over(partition by id order by end_date desc) as rn from ( select 1 as id,20 as status,'2020-03-28' as start_d
阅读全文
摘要:select GET_JSON_OBJECT(name_tmp,'$.val') as val ,GET_JSON_OBJECT(name_tmp,'$.area') as area ,GET_JSON_OBJECT(name_tmp,'$.setVal') as setVal ,GET_JSON_
阅读全文
摘要:with data as ( select '2020-04-15' as day,'收入' as type,'test1' as name,100 as amt, 5000 as bal union all select '2020-04-16' as day,'支出' as type,'test
阅读全文
摘要:-- oracle select substr(md5(key_id),1,1) as flag ,count(1) as cnt from ( select 1 as key_id union all select 2 as key_id union all select 3 as key_id
阅读全文
摘要:select name ,greatest(num1,num2,num3,num4) as max_num -- 最大值 ,least(num1,num2,num3,num4) as min_num -- 最小值 from ( select 'a' as name,1 as num1, 3 as num2, 2 as num3,4 as num4 ) t1 ;
阅读全文
摘要:ref: https://blog.csdn.net/qq_31573519/article/details/89054136
阅读全文
摘要:-- 将字段去重并合并 select concat_ws(',',collect_set(cast(id as string))) as wm from ( select 1 as id union all select 1 as id union all select 1 as id union all select 2 as id union a...
阅读全文
摘要:with tmp1 as ( select t1.day ,t1.value ,row_number() over(order by t1.day) as rn from ( select '2015-01-01' as day, 1 as value union all select '201...
阅读全文
摘要:### 读取hive的表结构,生成带comment的视图建表语句 # 读取配置文件中的表并进行遍历 grep -v '^#' tablesFile|while read tableName do status=1 viewName=$(echo ${tableName}|sed "s/^dwd_/dwd_cms_out_l_/i"|sed "s/^dws_/dws_cms_o...
阅读全文
摘要:定位: HIVE:长时间的批处理查询分析 impala:实时交互式SQL查询 impala优缺点优点: 1. 生成执行计划树,不用多次启动job造成多余开销,并且减少中间结果数据写入磁盘,执行速度快 2. 不占用yarn的资源 3. 缺点: 1. 不支持Date类型 2. 与HIVE数据不同步,需要
阅读全文
摘要:return code 2 为SQL报错。 return code 1 一般为权限问题。 具体要看源码。
阅读全文
摘要:-- 重点,目标表无重复数据 从结果可以看出,在无重复数据的情况下,parquet的压缩无用武之地,占用空间比textfile还大,ORC是压缩最强的文件模式。
阅读全文
摘要:-- hive中解析json数组 select t1.status ,substr(ss.col,1,10) as col ,t3.evcId ,t3.evcLicense ,t3.evcAddress ,t3.modelName from ( select get_json_object(json
阅读全文
摘要:-- LEAD(col,n,DEFAULT) 用于统计窗口内往下第n行值 -- 第一个参数为列名,第二个参数为往下第n行(可选,默认为1),第三个参数为默认值(当往下第n行为NULL时候,取默认值,如不指定,则为NULL) -- LAG(col,n,DEFAULT) 用于统计窗口内往上第n行值 -- 第一个参数为列名,第二个参数为往上第n行(可选,默认为1),第三个参数为默认值(当往上第n行...
阅读全文
摘要:ref: https://blog.csdn.net/u010670689/article/details/72885944
阅读全文