一道hive面试题:explode map字段
需要找到每个学生最好的课程和成绩,最差的课程和成绩,以及各科的平均分
文本数据如下:
name scores
张三 语文:78,数学:90,英语:82,历史:77,政治:80,物理:88,化学:79,地理:92,生物:86 李四 语文:68,数学:84,英语:79,历史:65,政治:66,物理:78,化学:72,地理:83,生物:75 王五 语文:88,数学:69,英语:86,历史:82,政治:70,物理:68,化学:89,地理:95,生物:93 朱六 语文:73,数学:76,英语:73,历史:76,政治:75,物理:85,化学:86,地理:85,生物:90 钱二 语文:68,数学:91,英语:92,历史:97,政治:83,物理:88,化学:77,地理:72,生物:70 段七 语文:86,数学:80,英语:85,历史:87,政治:88,物理:98,化学:69,地理:87,生物:74
建表语句:
create external table score_test( name string, scores map<String, int> ) row format delimited fields terminated by '\t' collection items terminated by ',' map keys terminated by ':' lines terminated by '\n'
通过hadoop命令将文本上传到hive表对应的目录下。
sql:
select a.name, max(if(a.key1 = 1, a.subject, '')) as bad_subject, max(if(a.key1 = 1, a.score, 0)) as bad_score, max(if(a.key2 = 1, a.subject, '')) as good_subject, max(if(a.key2 = 1, a.score, 0)) as good_score, avg(a.score) as avg_score from ( select name, subject, score, row_number() over(partition by name order by score) as key1, row_number() over(partition by name order by score desc) as key2 from score_test LATERAL VIEW explode(scores) adTable AS subject, score ) as a group by a.name having bad_score > 0 and good_score > 0
结果:
+---------+--------------+------------+---------------+-------------+-------------+ | name | bad_subject | bad_score | good_subject | good_score | avg_score | +---------+--------------+------------+---------------+-------------+-------------+ | 张三 | 历史 | 77 | 地理 | 92 | 83.56 | | 朱六 | 语文 | 73 | 生物 | 90 | 79.89 | | 李四 | 历史 | 65 | 数学 | 84 | 74.44 | | 段七 | 化学 | 69 | 物理 | 98 | 83.78 | | 王五 | 物理 | 68 | 地理 | 95 | 82.22 | | 钱二 | 语文 | 68 | 历史 | 97 | 82.00 | +---------+--------------+------------+---------------+-------------+-------------+