transform的作用
hive的TRANSFORM关键字提供了在SQL中调用自写脚本的功能,适合实现HIVE中没有的功能而又不会写UDF的情况
具体实现
需求:将timetamp类型的数据转换成weekday
(1)编写python脚本
vi /root/weekday.py
#!/bin/python import sys import datetime for line in sys.stdin: line = line.strip() movie,rate,ts,uid = line.split('\t') weekday = datetime.datetime.fromtimestamp(float(ts)).isoweekday() print '\t'.join([movie,rate,str(weekday),uid])
(2)将python脚本加入到hive的classpath
ADD file /root/weekday.py;
(3)测试transform方式解析数据
SELECT TRANSFORM(movie,rate,ts,uid) USING 'python weekday.py' AS (movieid,rate,weekday,userid) FROM t_movierate LIMIT 10 ;
(4)定义一张表用来存储最终结果
CREATE TABLE IF NOT EXISTS t_rate_weekday( movieid int, rate int, weekday int, userid int ) ROW format delimited fields terminated BY '\t' ;
(5)加载数据
INSERT INTO t_rate_weekday SELECT TRANSFORM(movie,rate,ts,uid) USING 'python weekday.py' AS (movieid,rate,weekday,userid) FROM t_movierate ;
注意:如果脚本不是Python,而是awk、sed等系统内置命令,可以直接使用,而不用add file。