HIVE 走近大数据之Hive进阶---慕课网---总结笔记--后续持续更新

HIVE进阶学习笔记
1、load加载数据
   1、语法：local本地路径，没有则是默认读取HDFS文件路径，partition数据存储到分区表内
       load data [local] inpath 'filepath' [overwrite] into table tablename [partition (partition = val1)]
   实例：
       1)将data1.txt数据导入data2,如果data1和data2的分隔符不一致，导入的数据是null
           load data local inpath '/usr/data/data1.txt' into table data2;
       2)将/usr/data/整个目录下面的数据导入data2表，并且进行覆盖原来的数据
           load data local inpath '/usr/data/' overwrite into table data2;
       3)将HDFS里面的数据导入data2
           load data inpath '/data/data1.txt' into table data2;
       4)将data1.txt数据导入分区表,分区条件partition (gender='M')
           load data local inpath '/usr/data/data1.txt' into table partition_data2 partition (gender='M');
2、sqoop的使用
   1、sqoop设置环境变量
       export HADOOP_COMMON_HOME=/USR/HADOOP
       export HADOOP_MAPRED_HOME=/USR/HADOOP
   2、sqoop的数据导入导出
       ./sqoop import --connect jdbc:mysql://master:3306 --username root --password root --table emp --columns 'id, name, gender' -m 1 --target-dir '/hive'


3、hive表的查询操作
   1、员工号，名字，性别，月薪
       select id, name, gender, sal from emp;
   2、员工号，名字，性别，月薪, 年薪
       select id, name, gender, sal, sal*12 from emp;
   3、员工号，名字，性别，月薪, 年薪, 奖金，年收入--判断奖金是否为null，nvl(comm,0)
       select id, name, gender, sal, sal*12, comm, sal*12+nvl(comm,0) from emp;
   4、查询奖金是否为空
       select * from emp where comm is null;
   5、去掉重复记录--distinct
       select distinct id from emp;
   6、hive某些操作要进行MapReduce操作，为了提高速度进行设置不进行MapReduce操作
       hive.fetch.task.conversion=more;
   7、查询10号部门的员工
       select * from emp where id = 10;
   8、根据条件用and或or进行多条件查询
       //and 两个条件必须同时满足
       select * from emp where id = 10 and sal < 2000;
       //and 两个条件满足其一即可
       select * from emp where id = 10 or sal < 2000;
   9、模糊查询
       1)查询名字里面含有s的，%代表所有，_代表一个字符
           select * from emp where name = "%s"
       2)如果查询数据里面出现了像_,%这样的特殊符号并且还在SQL里面属于关键字符就需要用到转移字符\\
           select * from emp where name = "%\\_%"
   10、排序--默认是升序asc，需要降序加desc，
       select * from emp order by id --desc
       //order by后面可以跟：列、表达式、别名、序号(开启set hive.groupby.orderby.position.alias=true)
       select sal, sal*12 year from emp order by sal, sal*12, year, 1
       //查询里面有NULL
       //升序默认是在最上面、降序是在最下面，需要把NULL转换成0显示或者0.0
       select nvl(sal,0), * from emp oder by sal;
4、hive函数
   1、数学函数(可以针对字段进行使用)：round、ceil、floor
       1)round(数，保留位数)，四舍五入 ---11.93
           select round(11.932) --默认四舍五入取整数
           select round(11.932,2)
       2)ceil向上取整 --12
           select ceil(11.633)
       3)floor向下取整 --11
           select floor(11.623)
   2、字符函数("" == '')
       1)lower字符转换成小写,upper字符转换成大写
           select lower("helloworld"), upper("helloworld");
       2)length字符的长度，一个汉字占两个字节,不是两个字符
           select length("helloworld"), length("你好");
       3)concat拼接字符
           select concat("hello", "world");
       4)substr截取字符
           select substr("helloworld",2); --从第2个字符开始截取
           select substr("helloworld",2,3); --从第2个字符开始截取3个字符
       5)trim去掉两边空格
           select trim(" hello world ");
       6)lpad左补齐,rpad右补齐
           select lpad("hello",10,"*"), rpad("hello",10,"*");