hive 安装
1、下载地址:http://mirrors.hust.edu.cn/apache/
环境变量: vi /etc/profile 追加 #set hive environment export HIVE_HOME=/opt/apache-hive-3.1.1 export PATH=$PATH:$HIVE_HOME/bin
2、conf配置
解压后进入hive的conf目录,新建配置文件,设定hive元数据存储的数据库等
vi hive-site.xml <configuration>
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://192.168.1.101:3306/hivedb?createDatabaseIfNotExist=true&useSSL=false</value> #用&转义等于&
<description>JDBC connect string for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
<description>Driver class name for a JDBC metastore</description>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
<description>username to use against metastore database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>123@abc.com</value>
<description>password to use against metastore database</description>
</property>
<property>
<name>hive.metastore.warehouse.dir</name>
<value>/hive/warehouse</value>
<description>hive default warehouse, if nessecory, change it</description>
</property>
</configuration>
因为用的是mysql作为元数据存储,将mysql连接jar包放入hive的lib目录下:mysql-connector-java-5.1.47.jar
3、初始化元数据
schematool -dbType mysql -initSchema
SLF4J: Class path contains multiple SLF4J bindings. SLF4J: Found binding in [jar:file:/opt/apache-hive-3.1.1/lib/log4j-slf4j-impl-2.10.0.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: Found binding in [jar:file:/opt/hadoop/hadoop-3.2.0/share/hadoop/common/lib/slf4j-log4j12-1.7.25.jar!/org/slf4j/impl/StaticLoggerBinder.class] SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation. SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory] Metastore connection URL: jdbc:mysql://192.168.1.101:3306/hivedb?createDatabaseIfNotExist=true Metastore Connection Driver : com.mysql.jdbc.Driver Metastore connection User: hive Starting metastore schema initialization to 3.1.0 Initialization script hive-schema-3.1.0.mysql.sql Initialization script completed schemaTool completed
4、hive基本使用
启动:hive --service cli或者hive
4.1、创建数据库
hive> create database busdata; OK Time taken: 0.735 seconds hive> show databases; OK busdata default Time taken: 0.161 seconds, Fetched: 2 row(s) hive> select current_database(); OK default Time taken: 2.362 seconds, Fetched: 1 row(s) hive> use busdata; OK
4.2、建表
hive> create table userinfo(userid int,createTime timestamp,email string,expiredate date,name string,password string,salt string,state tinyint,tel string,username string) row format delimited fields terminated by ","; OK Time taken: 1.109 seconds hive> show tables; OK userinfo Time taken: 0.04 seconds, Fetched: 1 row(s)
在Hive中提供了字段类型TIMESTAMP,但其对应的时间格式为yyyy-MM-dd HH:MM:SS,即通过LOAD命令将文本加载到已创建的表中时,只有满足上述格式的内容才可以被正确解析,否则只能识别为NULL。
4.3、加载本地数据,文件是cp
hive> load data local inpath "/home/hadoop/userinfo.txt" into table userinfo;
Loading data to table busdata.userinfo
OK
Time taken: 0.562 seconds
加载hdfs上的数据,hive直接把源文件mv走了
hive> load data inpath "/mysql/syslog2/part-m-00000" into table syslog;
Loading data to table busdata.syslog
OK
Time taken: 0.431 seconds
加载批量数据
hive> load data local inpath "/home/hadoop/sampler/*.csv" into table weather_data;
Loading data to table busdata.weather_data
OK
Time taken: 0.523 seconds
4.4、查看数据
hive> select userid,username,name,createtime from userinfo; OK 1 admin 管理员 2018-10-04 22:22:41 2 xu.dm 管理员 2018-10-04 22:22:00 3 myuser java编程 2018-10-05 17:16:19 4 user123 abc测 2018-10-12 18:22:42 5 user1 mytest 2018-10-18 00:36:39 6 user2 abc测 2018-10-18 22:37:24 ... ... Time taken: 0.337 seconds, Fetched: 22 row(s)
4.5、查看表结构
hive> desc userinfo; OK userid int createtime timestamp email string expiredate date name string password string salt string state tinyint tel string username string Time taken: 0.112 seconds, Fetched: 10 row(s)
hive> desc extended userinfo; OK userid int createtime timestamp email string expiredate date name string password string salt string state tinyint tel string username string Detailed Table Information Table(tableName:userinfo, dbName:busdata, owner:hadoop, createTime:1551432622, lastAccessTime:0, retention:0,
sd:StorageDescriptor(cols:[FieldSchema(name:userid, type:int, comment:null),
FieldSchema(name:createtime, type:timestamp, comment:null), FieldSchema(name:email, type:string, comment:null), FieldSchema(name:expiredate, type:date, comment:null),
FieldSchema(name:name, type:string, comment:null), FieldSchema(name:password, type:string, comment:null), FieldSchema(name:salt, type:string, comment:null),
FieldSchema(name:state, type:tinyint, comment:null), FieldSchema(name:tel, type:string, comment:null), FieldSchema(name:username, type:string, comment:null)],
location:hdfs://bigdata-senior01.home.com:9000/hive/warehouse/busdata.db/userinfo,
inputFormat:org.apache.hadoop.mapred.TextInputFormat,
outputFormat:org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat, compressed:false, numBuckets:-1,
serdeInfo:SerDeInfo(name:null, serializationLib:org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe,
parameters:{serialization.format=,, field.delim=,}), bucketCols:[], sortCols:[], parameters:{}, skewedInfo:SkewedInfo(skewedColNames:[], skewedColValues:[],
skewedColValueLocationMaps:{}), storedAsSubDirectories:false), partitionKeys:[], parameters:{totalSize=2642, numRows=0, rawDataSize=0, numFiles=1,
transient_lastDdlTime=1551432786, bucketing_version=2}, viewOriginalText:null, viewExpandedText:null, tableType:MANAGED_TABLE, rewriteEnabled:false, catName:hive, ownerType:USER) Time taken: 0.138 seconds, Fetched: 12 row(s)
hive> desc formatted userinfo; OK # col_name data_type comment userid int createtime timestamp email string expiredate date name string password string salt string state tinyint tel string username string # Detailed Table Information Database: busdata OwnerType: USER Owner: hadoop CreateTime: Fri Mar 01 17:30:22 CST 2019 LastAccessTime: UNKNOWN Retention: 0 Location: hdfs://bigdata-senior01.home.com:9000/hive/warehouse/busdata.db/userinfo Table Type: MANAGED_TABLE Table Parameters: bucketing_version 2 numFiles 1 numRows 0 rawDataSize 0 totalSize 2642 transient_lastDdlTime 1551432786 # Storage Information SerDe Library: org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat: org.apache.hadoop.mapred.TextInputFormat OutputFormat: org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Compressed: No Num Buckets: -1 Bucket Columns: [] Sort Columns: [] Storage Desc Params: field.delim , serialization.format , Time taken: 0.125 seconds, Fetched: 40 row(s)