Hive练习题20道及解题过程、开发中的常见问题和细节

Hive练习题20道及解题过程、开发中的常见问题和细节

目录

开发中的常见问题和细节

distinct -- 去重

可以放在select后面,表示所有字段的去重

也可以放在聚合函数中,表示对函数中的字段去重

Hive中where条件不支持不等式的子查询,可以通过join实现

select … from 表名列表 多个表名之间用 ',' 隔开

两张表join的时候不支持,两个表的字段 非相等 的操作

having 是配合着 group by 来使用的

一:将下列数据加载hive表。

员工信息表emp:

字段:员工id,员工名字,工作岗位,部门经理,受雇日期,薪水,奖金,部门编号
英文名:EMPNO,ENAME,JOB,MGR,HIREDATE,SAL,BONUS,DEPTNO

create table emp(
    EMPNO int
    ,ENAME string
    ,JOB string
    ,MGR int
    ,HIREDATE string
    ,SAL int
    ,BONUS int
    ,DEPTNO int
) 
row format delimited
fields terminated by ',';


//数据
7369,SMITH,CLERK,7902,1980-12-17,800,null,20
7499,ALLEN,SALESMAN,7698,1981-02-20,1600,300,30
7521,WARD,SALESMAN,7698,1981-02-22,1250,500,30
7566,JONES,MANAGER,7839,1981-04-02,2975,null,20,
7654,MARTIN,SALESMAN,7698,1981-09-28,1250,1400,30
7698,BLAKE,MANAGER,7839,1981-05-01,2850,null,30
7782,CLARK,MANAGER,7839,1981-06-09,2450,null,10
7788,SCOTT,ANALYST,7566,1987-04-19,3000,null,20
7839,KING,PRESIDENT,null,1981-11-17,5000,null,10
7844,TURNER,SALESMAN,7698,1981-09-08,1500,0,30
7876,ADAMS,CLERK,7788,1987-05-23,1100,null,20
7900,JAMES,CLERK,7698,1981-12-03,950,null,30
7902,FORD,ANALYST,7566,1981-12-03,3000,null,20
7934,MILLER,CLERK,7782,1982-01-23,1300,null,10

部门信息表dept:

字段:部门编号,部门名称,部门地点
英文名:DEPTNO,DEPTNAME,DEPTADDR

create table dept(
    DEPTNO int
    ,DEPTNAME string
    ,DEPTADDR string
) 
row format delimited
fields terminated by ',';


//数据
10,ACCOUNTING,NEW YORK
10,ACCOUNTING,shanghai
20,RESEARCH,DALLAS
30,SALES,CHICAGO
40,OPERATIONS,BOSTON

二:使用HQL完成下面需求:

1. 列出至少有一个员工的所有部门。

distinct -- 去重,可以放在select后面,表示所有字段的去重;也可以放在聚合函数中,表示对函数中的字段去重

// 注意count distinct
select   t1.deptno
        ,t1.DEPTNAME
        ,t1.DEPTADDR
        ,t2.cnt
from dept t1
join(
    select  deptno
            ,count(distinct EMPNO) as cnt
    from emp
    group by deptno
) t2 on t1.deptno = t2.deptno;
2. 列出薪金比“SMITH”多的所有员工。

这里 薪金 -- 薪水+奖金

Hive中where条件不支持不等式的子查询,可以通过join实现

select … from 表名列表 多个表名之间用 ',' 隔开

// MySQL的写法
select  *
from emp
where 12*SAL+nvl(BONUS) > (
    select  12*SAL+nvl(BONUS,0) as sal_bonus
    from emp
    where ENAME = 'SMITH'
);

// Hive中where条件不支持不等式的子查询,可以通过join实现
select  t1.EMPNO
        ,t1.ENAME
        ,t1.sal_bonus
from (
    select  EMPNO
            ,ENAME
            ,12*SAL+nvl(BONUS,0) as sal_bonus
            ,1 as tmp_id
    from emp
) t1 
join (
    select  12*SAL+nvl(BONUS,0) as sal_bonus
            ,1 as tmp_id
    from emp
    where ENAME = 'SMITH'
) t2 on t1.tmp_id = t2.tmp_id
where t1.sal_bonus > t2.sal_bonus;
3. 列出所有员工的姓名及其直接上级的姓名。
select  t1.ENAME
        ,t2.ENAME as leader_name
from emp t1
join emp t2
on t1.MGR = t2.EMPNO;
4. 列出受雇日期早于其直接上级的所有员工。

两张表join的时候不支持,两个表的字段 非相等 的操作

select  t1.ENAME
        ,t1.HIREDATE
        ,t2.ENAME as leader_name
        ,t2.HIREDATE as leader_hiredate
from emp t1
left join emp t2 
on t1.MGR = t2.EMPNO
where t1.HIREDATE < t2.HIREDATE;
5. 列出部门名称和这些部门的员工信息,同时列出那些没有员工的部门。
select  distinct
        t1.DEPTNO
        ,t1.DEPTNAME
        ,t2.EMPNO
        ,t2.ENAME
from dept t1
left join emp t2
on t1.DEPTNO = t2.DEPTNO;
6. 列出所有“CLERK”(办事员)的姓名及其部门名称。
select  t1.ENAME
        ,t2.DEPTNAME
from (
    select  ENAME
            ,DEPTNO
    from emp
    where JOB = 'CLERK'
) t1 join (
    select  distinct
            DEPTNO
            ,DEPTNAME
    from dept
) t2 on t1.DEPTNO = t2.DEPTNO;
7. 列出最低薪水大于1500的各种工作。
select  t1.JOB
        ,t1.min_sal
from (
    select  JOB
            ,min(SAL) as min_sal
    from emp
    group by JOB
) t1 where t1.min_sal > 1500;

select  JOB
        ,min(SAL) as min_sal
from emp
group by JOB
having min_sal > 1500;
8. 列出在部门“SALES”(销售部)工作的员工的姓名,假定不知道销售部的部门编号

假定不知道销售部的部门编号 -- 所以要先查一遍部门编号

//where筛选条件中的子查询的结果不止一条,所以在hive中不能用 =

//可以用 in 代替

select EMPNO
,ENAME
~~from emp
where DEPTNO = (
select DEPTNO
from dept
where DEPTNAME = 'SALES'
);

select  EMPNO
        ,ENAME
from emp
where DEPTNO in (
    select  DEPTNO
    from dept
    where DEPTNAME = 'SALES'
);
9. 列出薪金高于公司平均薪金的所有员工。
select  t1.EMPNO
        ,t1.ENAME
        ,t1.sal_bonus
from (
    select  EMPNO
            ,ENAME
            ,12*SAL+nvl(BONUS,0) as sal_bonus
            ,1 as tmp_id
    from emp
) t1 join(
    select  round(avg(12*SAL+nvl(BONUS,0)),2) as avg_sal_bonus
            ,1 as tmp_id
    from emp
) t2 on t1.tmp_id = t2.tmp_id
where t1.sal_bonus > t2.avg_sal_bonus;
10.列出与“SCOTT”从事相同工作的所有员工。

在where中使用 exists() -- 存在返回TRUE,反之返回FALSE

即在 ( ) 中的select语句能查出结果则返回TRUE

//写法一
select  EMPNO
        ,t1.ENAME
        ,JOB
from emp t1
where t1.ENAME != 'SCOTT'
and JOB in (
    select  JOB
    from emp
    where ENAME = 'SCOTT'
);
//写法二
select  EMPNO
        ,t1.ENAME
        ,JOB
from emp t1
where t1.ENAME != 'SCOTT'
and exists (
    select  JOB
    from emp t2
    where ENAME = 'SCOTT' and t1.JOB=t2.JOB
);
11.列出薪水等于部门30中员工的薪水的所有员工的姓名和薪水。
select  t1.ENAME
        ,t1.SAL
from emp t1
where t1.DEPTNO != 30
and t1.SAL in (
    select  SAL
    from emp
    where DEPTNO = 30
);
12.列出薪金高于在部门30工作的所有员工的薪金的员工姓名和薪金。
select  t1.ENAME
        ,t1.sal_bonus
        ,t2.max_sal_bonus
from (
    select  t1.ENAME
            ,12*t1.SAL+nvl(t1.BONUS,0) as sal_bonus
    from emp t1
    where t1.DEPTNO != 30
) t1 join(
    select  max(12*SAL+nvl(BONUS,0)) as max_sal_bonus     
    from emp
    where DEPTNO = 30
) t2 on 1=1
where t1.sal_bonus > t2.max_sal_bonus ;
13.列出在每个部门工作的员工数量、平均工资和平均服务期限。
select  DEPTNO
        ,count(distinct EMPNO) as cnt
        ,round(avg(12*SAL+nvl(BONUS,0)),2) as avg_sal_bonus
        ,round(avg(datediff(current_date(),HIREDATE)),2) as avg_work_days
from emp
group by DEPTNO;
14.列出所有员工的姓名、部门名称和工资。
select  t1.ENAME
        ,t1.SAL
        ,t2.DEPTNAME
        ,t2.DEPTADDR
from emp t1
join dept t2 
on t1.DEPTNO = t2.DEPTNO;
15.列出所有部门的详细信息和部门人数。
select  t2.DEPTNO
        ,t2.DEPTNAME
        ,t2.DEPTADDR
        ,t1.cnt
from (
    select  DEPTNO
            ,count(distinct EMPNO) as cnt
    from emp
    group by DEPTNO
) t1 right join dept t2
on t1.DEPTNO = t2.DEPTNO;
16.列出各种工作的最低工资。
select  JOB
        ,min(SAL) as min_sal
from emp
group by JOB;
17.列出各个部门的MANAGER(经理)的最低薪金。
select  t1.DEPTNO
        ,min(12*SAL+nvl(BONUS,0)) as min_sal_bonus
from (
    select  DEPTNO
            ,SAL
            ,BONUS
    from emp
    where JOB = 'MANAGER'
) t1 group by t1.DEPTNO;
18.列出所有员工的年工资,按年薪从低到高排序。
select  EMPNO
        ,ENAME
        ,12*SAL + nvl(BONUS,0) as year_sal
from emp
order by year_sal;
19.列出每个部门薪水前两名最高的人员名称以及薪水。

having 是配合着 group by 来使用的

select  t1.DEPTNO
        ,t1.ENAME
        ,t1.SAL
        ,t1.rn
from (
    select  DEPTNO
            ,ENAME
            ,SAL
            ,row_number() over (partition by DEPTNO ORDER by SAL DESC) as rn
    from emp
) t1 where t1.rn <= 2;
20.列出每个员工从受雇开始到2018-12-12 为止共受雇了多少天。
select  EMPNO
        ,ENAME
        ,datediff('2018-12-12',HIREDATE) as days
from emp;
posted @ 2022-02-21 21:12  赤兔胭脂小吕布  阅读(769)  评论(0编辑  收藏  举报