hive基本查询命令
CREATE TABLE employees(
name STRING,
salary FLOAT,
subordinatates ARRAY<STRING>
deductions MAP<STRING,FLOAT>
address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT>)
PARTITIONED BY (country STRING,state STRING);#按照country和state分区
SELECT查询:
SELECT 字段名
FROM 后接 (1.操作表,2.子查询)
表别名:
SELECT e.name e.salary FROM employees e;
字段值计算:
算术操作符
函数:数学计算函数、聚集函数、表生成函数
字段别名 SELECT ***** as ****
LIMIT:
复杂的情况CASE:
SELECT name,salary
CASE
WHEN salary<50000.0 THEN 'low'
WHEN salary>=50000.0 AND salary<70000.0 THEN 'middle'
WHEN salary>=70000.0 AND salary<100000.0 THEN 'high'
ELSE 'very high'
END AS 'bracket' FROM employees
WHERE条件过滤
SELECT * FROM employees WHERE country='US' AND state=‘CA’
GROUP BY分组统计
SELECT year(ymd),avg(price_close) FROM stocks
WHERE exchange='NASDAQ' AND symbol='AAPL'
GROUP BY year(ymd);
HAVING:用于约束结果集,只给出符合HAVING条件的结果
JOIN 关联:
Hive只支持等值JOIN,默认是inner JOIN
SELECT a.* FROM a JOIN b ON (a.id=b.id AND a.department = b.department) hive支持√
SELECT a.* FROM a JOIN b ON (a.id<>b.id) hive不支持×
Hive Join的LEFT SEMI JOIN
SELECT a.* FROM a WHERE a.key IN (SELECT b.key FROM b WHERE b.value>100 )HIve中没有
<=>
SELECT a.* FROM a LEFT SEMI JOIN b ON (a.key=b.key AND b.value>100 )
ORDER BY/SORT BY 排序
抽样(数据量大时,只取一部分来看)
1.BUCKET SAMPLING
TABLESAMPLE(BUCKET x OUT OF y [ON colname])
2.BLOCK SAMPLING
SELECT * FROM source TABLESAMPLE(0.1 PERCENT)s;
SELECT * FROM source TABLESAMPLE(100M)s;
SELECT * FROM source TABLESAMPLE(10 ROWS) ;