Hive笔记6

第六章查询

查询语句语法：

SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[ORDER BY col_list]
[CLUSTER BY col_list
| [DISTRIBUTE BY col_list] [SORT BY col_list]
]
[LIMIT number]

6.1 基本查询

6.1.1 全表和特定列查询
1，全表查询
select * from emp;
2,选择特定列查询
select empno, ename from emp;
注意：
SQL语言大小写不敏感
SQL可以写在一行或者多行
关键字不能被缩写也不能分行
各子句一般要分行写
使用缩进提高语句的可读性

6.1.2 列别名

6.1.5 Limit语句

典型的查询会返回多行数据，LIMIT子句用于限制返回的行数
select * from emp limit 5；

where语句

1，使用where子句，将不满足条件的行过滤掉
2，where子句紧随from子句
3，查询薪水大于1000的所有的员工
select * from emp where sal > 1000;

6.2.1 比较运算符（Between/ in / is null）

2）案例实操
1.查询出薪水等于5000 的所有员工
select * from emp where sal = 5000；
2，查询comm为空的所有的员工
select * from emp where comm is null；
（3）查询工资在500到1000之间的员工信息
select * from emp where sal bewteen 500 and 1000;
（4）查询工资是1500或5000的员工信息
hive (default)> select * from emp where sal IN (1500, 5000);

6.2.2Like和RLike

1）使用LIKE运算选择类似的值
2）选择条件可以包含字符或数字
% 代表零个或多个字符(任意个字符)。
_ 代表一个字符。
3）RLIKE子句是Hive中这个功能的一个扩展，其可以通过Java的正则表达式这个更强大的语言来指定匹配条件。
4）案例实操
（1）查找以2开头薪水的员工信息
hive (default)> select * from emp where sal LIKE '2%';
（2）查找第二个数值为2的薪水的员工信息
hive (default)> select * from emp where sal LIKE '_2%';
（3）查找薪水中含有2的员工信息
hive (default)> select * from emp where sal RLIKE '[2]';

案例实操
（1）查询薪水大于1000，部门是30
hive (default)> select * from emp where sal>1000 and deptno=30;
（2）查询薪水大于1000，或者部门是30
hive (default)> select * from emp where sal>1000 or deptno=30;
（3）查询除了20部门和30部门以外的员工信息
hive (default)> select * from emp where deptno not IN(30, 20);

6.3分组

6.3.1 Group By语句
Group By 语句通常会和聚合函数一起使用，按照一个或者多个列队结果进行分组，
然后对每个组执行聚合操作
案例实操：
（1）计算emp表每个部门的平均工资
hive (default)> select t.deptno, avg(t.sal) avg_sal from emp t group by t.deptno;
（2）计算emp每个部门中每个岗位的最高薪水
hive (default)> select t.deptno, t.job, max(t.sal) max_sal from emp t group by
t.deptno, t.job;

6.3.2 Having语句
1．having与where不同点
（1）where针对表中的列发挥作用，查询数据；having针对查询结果中的列发挥作用，筛选数据。
（2）where后面不能写分组函数，而having后面可以使用分组函数。
（3）having只用于group by分组统计语句。
2．案例实操
（1）求每个部门的平均薪水大于2000的部门
求每个部门的平均工资
hive (default)> select deptno, avg(sal) from emp group by deptno;
求每个部门的平均薪水大于2000的部门
hive (default)> select deptno, avg(sal) avg_sal from emp group by deptno having
avg_sal > 2000;

join连接

6.4.1 等值Join
Hive支持通常的SQL JOIN语句，但是只支持等值连接，不支持非等值连接。
案例实操
（1）根据员工表和部门表中的部门编号相等，查询员工编号、员工名称和部门名称；
hive (default)> select e.empno, e.ename, d.deptno, d.dname from emp e join dept d
on e.deptno = d.deptno;

6.4.3 内连接
内连接：只有进行连接的两个表中都存在与连接条件相匹配的数据才会被保留下来。
hive (default)> select e.empno, e.ename, d.deptno from emp e join dept d on e.deptno
= d.deptno;
6.4.4 左外连接
左外连接：JOIN操作符左边表中符合WHERE子句的所有记录将会被返回。
hive (default)> select e.empno, e.ename, d.deptno from emp e left join dept d on e.deptno
= d.deptno;
6.4.5 右外连接
右外连接：JOIN操作符右边表中符合WHERE子句的所有记录将会被返回。
hive (default)> select e.empno, e.ename, d.deptno from emp e right join dept d on e.deptno
= d.deptno;
6.4.6 满外连接
满外连接：将会返回所有表中符合WHERE语句条件的所有记录。如果任一表的指定字段没有符合条件的值的话，那么就使用NULL值替代。
hive (default)> select e.empno, e.ename, d.deptno from emp e full join dept d on e.deptno
= d.deptno;

6.4.7 多表连接
注意：连接 n个表，至少需要n-1个连接条件。例如：连接三个表，至少需要两个连接条件。