hive基本查询命令

CREATE TABLE employees(

name STRING,

salary FLOAT,

subordinatates ARRAY<STRING>

deductions MAP<STRING,FLOAT>

address STRUCT<street:STRING,city:STRING,state:STRING,zip:INT>)

PARTITIONED BY (country STRING,state STRING);#按照country和state分区

SELECT查询:

 SELECT 字段名

    FROM  后接 (1.操作表,2.子查询)

表别名:

SELECT e.name e.salary FROM employees e;

字段值计算:

算术操作符

函数:数学计算函数、聚集函数、表生成函数

字段别名 SELECT ***** as ****

LIMIT:

复杂的情况CASE

SELECT name,salary

CASE

WHEN salary<50000.0 THEN 'low'

WHEN salary>=50000.0 AND salary<70000.0 THEN 'middle'

WHEN salary>=70000.0 AND salary<100000.0 THEN 'high'

ELSE 'very high'

END AS 'bracket' FROM employees

 WHERE条件过滤

SELECT * FROM employees WHERE country='US' AND state=‘CA’

 GROUP BY分组统计

SELECT year(ymd),avg(price_close) FROM stocks 

WHERE exchange='NASDAQ' AND symbol='AAPL'

GROUP BY year(ymd);

 HAVING:用于约束结果集,只给出符合HAVING条件的结果

JOIN 关联:

Hive只支持等值JOIN,默认是inner JOIN

SELECT a.* FROM a JOIN b ON (a.id=b.id AND a.department = b.department)    hive支持√

SELECT a.* FROM a JOIN b ON (a.id<>b.id)            hive不支持×

Hive Join的LEFT SEMI JOIN 

SELECT a.* FROM a WHERE a.key IN (SELECT b.key FROM b WHERE b.value>100 )HIve中没有

<=>

SELECT a.* FROM a LEFT SEMI JOIN b ON  (a.key=b.key AND b.value>100 )

 ORDER BY/SORT BY 排序

抽样(数据量大时,只取一部分来看)

1.BUCKET SAMPLING

TABLESAMPLE(BUCKET x OUT OF y [ON colname])

 

2.BLOCK SAMPLING

SELECT * FROM source TABLESAMPLE(0.1 PERCENT)s;

SELECT * FROM source TABLESAMPLE(100M)s;

SELECT * FROM source TABLESAMPLE(10 ROWS) ;

 

posted @ 2020-08-04 10:00  猫七的blog  阅读(14)  评论(0编辑  收藏  举报