013_Hive

01:Hive基本概念

将HQL转换为MapReduce程序：

Hive的数据存储在Hdfs上

hive分析数据传递称的实现是MapReduce

执行程序运行在YARN上

Hive的架构

Hive常用于数据分析，对实实时性要求不高

Hive和数据库比较

查询语言

数据存储位置

数据更新

索引

执行

执行延迟

02Hive安装准备

版本，新特性

安装准备

启动hadoop

#在hadoop102上启动dataNode和nameNode
start-dfs.sh  

#在hadoop103上开启ResourceManager
start-yarn.sh

Hadoop的各个Web界面的地址和接口
2019-08-26 16:39:11
1、HDFS页面：50070
2、YARN的管理界面：8088
3、HistoryServer的管理界面：19888
4、Zookeeper的服务端口号：2181
5、Mysql的服务端口号：3306
6、Hive.server1=10000
7、Kafka的服务端口号：9092
8、azkaban界面：8443
9、Hbase界面：16010,60010
10、Spark的界面：8080
11、Spark的URL：7077

安装

tar -zxvf apache-hive-1.2.1-bin.tar.gz -C /usr/soft/
mv apache-hive-1.2.1-bin/  hive/

配置文件 #cat hive-env.sh

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI/HWI etc.) is available via the environment
# variable SERVICE


# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in 
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
#   if [ -z "$DEBUG" ]; then
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
#   else
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
#   fi
# fi

# The heap size of the jvm stared by hive shell script can be controlled via:
#
# export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions. 
# By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be 
# appropriate for hive server (hwi etc).


# Set HADOOP_HOME to point to a specific hadoop install directory
#hadoop的地址
 HADOOP_HOME=/usr/soft/hadoop-2.7.2

# Hive Configuration Directory can be controlled by:
# hive 配置地址
export HIVE_CONF_DIR=/usr/soft/hive/conf

# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=

使用hadoop来创建hadoop库文件

bin/hadoop fs -mkdir /tmp

bin/hadoop fs -chmod g+w /tmp

bin/hadoop fs -mkdir -p /user/hive/warehouse

bin/hadoop fs -chmod g+w /user/hive/warehouse

启动hive

bin/hive

bin/hive 启动 hive命令

查询数据库

show database；

打开默认数据库

use default;

显示default中的表

show tables;

创建一张表

create table student(id int,name string) ;

create table student(id int,name string) row format delimited fields terminated by '\t';

查看表的结构

desc student;

插入数据

insert into student values(001,'caoxiaoyang');

插入的时候，使用hadoop的hdfs

查询表中的数据

select *from studnet;

退出hive

quit;

03将本地文件导入到hive中

创建表

create table student(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data local inpath '/home/soft/datas/student.txt' into table student;

04数据库的基本使用

mysql -uroot -p123456

mysql表中主机配置

use mysql;

select host,user,password from user;

配置hive元数据到mysql

service mysql status 查看本机 mysql启动状态.

bin/hive -e 'select *from student;' 不启动hive,直接查询结果

bin/hive -f ./hsql.sql 使用外部的sql文件查询结果.

hive中查看hdfs文件系统中的内容

dfs -lsr /user;

05一些基本命令操作

查看本地系统中的内容。

!ls /opt;

查看hive 历史运行的命令

[root@node01 ~]# cat .hivehistory

显示当前数据库，以及查询表的头信息配置

<!-- 显示当前数据库，以及查询表的头信息配置 -->
<property>
	<name>hive.cli.print.header</name>
	<value>true</value>
</property>

<property>
	<name>hive.cli.print.current.db</name>
	<value>true</value>
</property>

查看hive所有的配置信息

set; -- 查看hive所有的配置信息

修改hive中的默认日志配置

mv hive-log4j.properties.temp hive-log4j.properties

vi hive-log4j.properties

hive.log.threshold=ALL 
hive.root.logger=INFO,DRFA
hive.log.dir=/home/soft/hive/logs
hive.log.file=hive.log

配置的优先级

文件配置

06数据类型

基本数据类型

集合数据类型

07DDL数据定义

创建数据库

create database db_hive;

创建数据库，如果存在则不创建

create database if not exists db_hive;

创建数据库，指定创建位置

create database if not exists db_hive2

location ' /db_hive2.db' ---/为hadoop下的目录，查看地址为http://192.168.10.120:50070

修改数据库

只能修改配置信息，不能修改数据库名和数据库所在位置。

desc database db_hive

在mysql中查看修改结果

desc database extended db_hive

查询数据库

show databases;

show databases like 'db*'

显示数据库信息。

desc db_hive;

use db_hive;

show tables;

删除数据库

删除空数据库

drop database db_hive;

drop database if exists db_hive;

show databases;

删除有数据的数据库

drop database if exists db_hive cascade;

创建表

内部表管理表

创建表语句

create table if not exists student2(
id int, name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/user/hive/warehouse/student';

迁移数据，相当于生成备份表

create table if not exists student3
as select id, name from student;

只创建表不迁移数据。

create table if not exists student4 like student;

删除表及数据

 drop table student;

外部表

外部表：项目组可以同时操作一个表，主要进行查询。

创建外部表。 -- external

create external table if not exists dept(
deptno int,
dname string,
loc int
)
row format delimited fields terminated by '\t';

分区表

创建分区表。--partitioned by (month string)

create table dept_partition(
               deptno int, dname string, loc string
 ) partitioned by (month string)
 row format delimited fields terminated by '\t';

加载数据到分区表

08DML数据库定义

数据导入

加载本地文件到hive

load data local input /opt/home/soft/student.txt into table student

加载hdfs文件到hive

dfs -put /opt/student.txt / --上传数据到hdfs中
load data input '/student.txt' into table student;

加载数据覆盖表中已有数据 --overwrite

load data local input /opt/home/soft/student.txt overwrite into table student

创建分区表

create table student(
     deptno int, dname string
 ) partitioned by (month string)
 row format delimited fields terminated by '\t';

insert into table student partition(month ='2019') values(100,'张三'，);

insert overwrite table student partition(month='201708') select id,name from student where month='201709';

from student 
insert overwrite table student partition(month='201708') 
select id,name from student where month='201709'
insert overwrite table student partition(month='201708') 
select id,name from student where month='201709'

location 加载数据

create table student2
     deptno int, dname string
 ) row format delimited fields terminated by '\t'
 location '/user/hive/warehouse/student2';

上传数据到hdfs上。

dfs -put /opt/student.txt   /user/hive/warehouse/student2 ;

数据导出

insert 导出数据。

清除表中的数据

Hive是基于Hadoop的一个数据仓库工具，可以将结构化的数据文件映射为一张数据库表，并提供简单的sql查询功能，可以将sql语句转换为MapReduce任务进行运行。HBase是Hadoop的数据库，一个分布式、可扩展、大数据的存储。单个的从字面意思上或许很难看出二者的区别，别急，下面我们就对二者做个详细的介绍。

两者的特点

hive和hbase的区别。
Hive帮助熟悉SQL的人运行MapReduce任务。因为它是JDBC兼容的，同时，它也能够和现存的SQL工具整合在一起。运行Hive查询会花费很长时间，因为它会默认遍历表中所有的数据。虽然有这样的缺点，一次遍历的数据量可以通过Hive的分区机制来控制。分区允许在数据集上运行过滤查询，这些数据集存储在不同的文件夹内，查询的时候只遍历指定文件夹(分区)中的数据。这种机制可以用来，例如，只处理在某一个时间范围内的文件，只要这些文件名中包括了时间格式。

  HBase通过存储key/value来工作。它支持四种主要的操作：增加或者更新行，查看一个范围内的cell，获取指定的行，删除指定的行、列或者是列的版本。版本信息用来获取历史数据(每一行的历史数据可以被删除，然后通过Hbase compactions就可以释放出空间)。虽然HBase包括表格，但是schema仅仅被表格和列簇所要求，列不需要schema。Hbase的表格包括增加/计数功能。

限制

Hive目前不支持更新操作。另外，由于hive在hadoop上运行批量操作，它需要花费很长的时间，通常是几分钟到几个小时才可以获取到查询的结果。Hive必须提供预先定义好的schema将文件和目录映射到列，并且Hive与ACID不兼容。

HBase查询是通过特定的语言来编写的，这种语言需要重新学习。类SQL的功能可以通过Apache Phonenix实现，但这是以必须提供schema为代价的。另外，Hbase也并不是兼容所有的ACID特性，虽然它支持某些特性。最后但不是最重要的–为了运行Hbase，Zookeeper是必须的，zookeeper是一个用来进行分布式协调的服务，这些服务包括配置服务，维护元信息和命名空间服务。

应用场景

Hive适合用来对一段时间内的数据进行分析查询，例如，用来计算趋势或者网站的日志。Hive不应该用来进行实时的查询。因为它需要很长时间才可以返回结果。

Hbase非常适合用来进行大数据的实时查询。Facebook用Hbase进行消息和实时的分析。它也可以用来统计Facebook的连接数。

总结

Hive和Hbase是两种基于Hadoop的不同技术–Hive是一种类SQL的引擎，并且运行MapReduce任务，Hbase是一种在Hadoop之上的NoSQL 的Key/vale数据库。当然，这两种工具是可以同时使用的。就像用

01:Hive基本概念

将HQL转换为MapReduce程序：

Hive的数据存储在Hdfs上

hive分析数据传递称的实现是MapReduce

执行程序运行在YARN上

Hive的架构

Hive常用于数据分析，对实实时性要求不高

Hive和数据库比较

查询语言

数据存储位置

数据更新

索引

执行

执行延迟

02Hive安装准备

版本，新特性

安装准备

启动hadoop

#在hadoop102上启动dataNode和nameNode
start-dfs.sh  

#在hadoop103上开启ResourceManager
start-yarn.sh

Hadoop的各个Web界面的地址和接口
2019-08-26 16:39:11
1、HDFS页面：50070
2、YARN的管理界面：8088
3、HistoryServer的管理界面：19888
4、Zookeeper的服务端口号：2181
5、Mysql的服务端口号：3306
6、Hive.server1=10000
7、Kafka的服务端口号：9092
8、azkaban界面：8443
9、Hbase界面：16010,60010
10、Spark的界面：8080
11、Spark的URL：7077

安装

tar -zxvf

配置文件 #cat hive-env.sh

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# Set Hive and Hadoop environment variables here. These variables can be used
# to control the execution of Hive. It should be used by admins to configure
# the Hive installation (so that users do not have to set environment variables
# or set command line parameters to get correct behavior).
#
# The hive service being invoked (CLI/HWI etc.) is available via the environment
# variable SERVICE


# Hive Client memory usage can be an issue if a large number of clients
# are running at the same time. The flags below have been useful in 
# reducing memory usage:
#
# if [ "$SERVICE" = "cli" ]; then
#   if [ -z "$DEBUG" ]; then
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:+UseParNewGC -XX:-UseGCOverheadLimit"
#   else
#     export HADOOP_OPTS="$HADOOP_OPTS -XX:NewRatio=12 -Xms10m -XX:MaxHeapFreeRatio=40 -XX:MinHeapFreeRatio=15 -XX:-UseGCOverheadLimit"
#   fi
# fi

# The heap size of the jvm stared by hive shell script can be controlled via:
#
# export HADOOP_HEAPSIZE=1024
#
# Larger heap size may be required when running queries over large number of files or partitions. 
# By default hive shell scripts use a heap size of 256 (MB).  Larger heap size would also be 
# appropriate for hive server (hwi etc).


# Set HADOOP_HOME to point to a specific hadoop install directory
#hadoop的地址
 HADOOP_HOME=/home/soft/hadoop-2.7.2

# Hive Configuration Directory can be controlled by:
# hive 配置地址
export HIVE_CONF_DIR=/home/soft/hive/conf

# Folder containing extra ibraries required for hive compilation/execution can be controlled by:
# export HIVE_AUX_JARS_PATH=

使用hadoop来创建hadoop库文件

bin/hadoop fs -mkdir /tmp

bin/hadoop fs -chmod g+w /tmp

bin/hadoop fs -mkdir -p /user/hive/warehouse

bin/hadoop fs -chmod g+w /user/hive/warehouse

启动hive

bin/hive

bin/hive 启动 hive命令

查询数据库

show database；

打开默认数据库

use default;

显示default中的表

show tables;

创建一张表

create table student(id int,name string) ;

create table student(id int,name string) row format delimited fields terminated by '\t';

查看表的结构

desc student;

插入数据

insert into student values(001,'caoxiaoyang');

插入的时候，使用hadoop的hdfs

查询表中的数据

select *from studnet;

退出hive

quit;

03将本地文件导入到hive中

创建表

create table student(id int, name string) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t';

load data local inpath '/home/soft/datas/student.txt' into table student;

04数据库的基本使用

安装MySql

#先查看是否安装mysql
rpm -qa|grep mysql
#卸载
rpm -e --nodeps mysql-libs-5.1.73-7.el6.x86_64
#解压mysql-libs.zip
unzip mysql-libs.zip
ls
mysql-libs.zip
mysql-libs
#进入my-libs
cd ./my-libs
#查看mysql-lib文件
cd /home/soft/tar/mysql-libs
[root@hadoop102 mysql-libs]# ll
total 76052
-rw-r--r--. 1 root root 18509960 Mar 26  2015 MySQL-client-5.6.24-1.el6.x86_64.rpm
drwxr-xr-x. 4 root root     4096 Oct 23  2013 mysql-connector-java-5.1.27
-rw-r--r--. 1 root root  3575135 Dec  1  2013 mysql-connector-java-5.1.27.tar.gz
-rw-r--r--. 1 root root 55782196 Mar 26  2015 MySQL-server-5.6.24-1.el6.x86_64.rpm
#安装mysql服务器
rpm -ivh MySQL-server-5.6.24-1.el6.x86_64.rpm
#查看安装mysql中产生的随机密码
cat /root/.mysql_secret
#查看安装mysql后的状态
service mysql status
[root@hadoop102 mysql-libs]# service mysql status
MySQL is not running                                       [FAILED]
#启动mysql服务器
service mysql start
[root@hadoop102 mysql-libs]# service mysql status
MySQL running (1822)

安装MySql

#安装mysql客户端
rpm -ivh MySQL-client-5.6.24-1.el6.x86_64.rpm
#链接mysql
mysql -uroot -p (随机密码 cat /root/.mysql_secret)
#修改密码
SET PASSWORD=PASSWORD('123456');
#退出mysql
exit

使用MySQL中user表中主机配置

#进入mysql
mysql -uroot -p123456
#显示数据库
show databases;
#使用数据库
use mysql;
#展示mysql数据库中的所有表
show databases;
#展示user表结构
desc user;
#查询user表
select User, Host, Password from user;
#修改user表。把host表内容修改为%
update user set host='%' where host='localhost';
#删除root用户的其他host
mysql>delete from user where Host='hadoop102';
mysql>delete from user where Host='127.0.0.1';
mysql>delete from user where Host='::1';
#刷新
mysql>flush privileges;
#退出
quit;

hive元数据配置到Mysql

#解压
tar -zxvf mysql-connector-java-5.1.27.tar.gz
#复制驱动
cp mysql-connector-java-5.1.27-bin.jar  /home/soft/hive/lib/
#在/home/soft/hive/config/创建hive-site.xml
touch hive-site.xml

#编辑hive-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>javax.jdo.option.ConnectionURL</name>
        <value>jdbc:mysql://hadoop102:3306/metastore?createDatabaseIfNotExist=true</value>
        <description>JDBC connect string for a JDBC metastore</description>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionDriverName</name>
        <value>com.mysql.jdbc.Driver</value>
        <description>Driver class name for a JDBC metastore</description>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionUserName</name>
        <value>root</value>
        <description>username to use against metastore database</description>
    </property>

    <property>
        <name>javax.jdo.option.ConnectionPassword</name>
        <value>123456</value>
        <description>password to use against metastore database</description>
    </property>
</configuration>

mysql -uroot -p123456

mysql表中主机配置

use mysql;

select host,user,password from user;

配置hive元数据到MySQL（重要）


bin/hiveserver2

bin/beeline

!connect jdbc:hive2://hadoop102:10000
root
回车

HiveJDBC访问

service mysql status 查看本机 mysql启动状态

bin/hive -e 'select *from student;' 不启动hive,直接查询结果

bin/hive -f ./hsql.sql 使用外部的sql文件查询结果.

hive中查看hdfs文件系统中的内容

dfs -lsr /user;

05一些基本命令操作

查看本地系统中的内容。

!ls /opt;

查看hive 历史运行的命令

[root@node01 ~]# cat .hivehistory

显示当前数据库，以及查询表的头信息配置

<!-- 显示当前数据库，以及查询表的头信息配置 -->
<property>
	<name>hive.cli.print.header</name>
	<value>true</value>
</property>

<property>
	<name>hive.cli.print.current.db</name>
	<value>true</value>
</property>

查看hive所有的配置信息

set; -- 查看hive所有的配置信息

<property>
	<name>hive.cli.print.header</name>
	<value>true</value>
</property>

<property>
	<name>hive.cli.print.current.db</name>
	<value>true</value>
</property>

<property>
<name>hive.metastore.warehouse.dir</name>
<value>/user/hive/warehouse</value>
<description>location of default database for the warehouse</description>
</property>

修改hive中的默认日志配置

mv hive-log4j.properties.temp hive-log4j.properties

vi hive-log4j.properties

hive.log.threshold=ALL 
hive.root.logger=INFO,DRFA
hive.log.dir=/home/soft/hive/logs
hive.log.file=hive.log

配置的优先级

文件配置

06数据类型

基本数据类型

集合数据类型

07DDL数据定义

创建数据库

hdfs safemode -get

bin/hiveserver2


bin/beeline

!connect jdbc:hive2://hadoop102:10000
root
回车

创建数据库

create database db_hive;

创建数据库，如果存在则不创建

create database if not exists db_hive;

创建数据库，指定创建位置

create database if not exists db_hive2

location ' /db_hive2.db' ---/为hadoop下的目录，查看地址为http://192.168.10.120:50070

修改数据库

只能修改配置信息，不能修改数据库名和数据库所在位置。

desc database db_hive

在mysql中查看修改结果

desc database extended db_hive

查询数据库

show databases;

show databases like 'db*'

显示数据库信息。

desc db_hive;

use db_hive;

show tables;

删除数据库

删除空数据库

drop database db_hive;

drop database if exists db_hive;

show databases;

删除有数据的数据库

drop database if exists db_hive cascade;

创建表

内部表管理表

创建表语句

create table if not exists student2(
    id int,
    name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/user/hive/warehouse/db_hive.db/student2';

#删除hadoop里面的目录文件
hadoop fs -rm /user/hive/warehouse/student2
#删除hadoop里面的目录文件(加入了-r 选项 循环删除)
hadoop fs -rm -r /user
/*location 表示的是数据在hadoop的位置**/


#删除表及数据# 迁移数据，相当于生成备份表
create table if not exists student
as select id, name from student2;

# 只创建表不迁移数据。
create table if not exists student4 like student;

#查看详情
desc formatted student2;

#删除表及数据
drop table student;

#
hadoop fs -put ./student.txt /student

外部表

外部表：项目组可以同时操作一个表，主要进行查询。

创建外部表。

hadoop fs -mkdir /student
hadoop fs -put  /home/soft/student.txt /student
create table student6(
    id int,
    name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/student';
drop table student6;


-- external
hadoop fs -mkdir /student
hadoop fs -put  /home/soft/student.txt /student
create external table student6(
    id int,
    name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/student';

管理表和内部表转换

分区表

创建分区表。

-- partitioned by (month string)
-- 创建一级分区表
create table dept_partition(
    deptno int, 
    dname string, 
    loc string
 ) partitioned by (month string)
 row format delimited fields terminated by '\t';
 #加载数据到分区表
load data local input '/home/soft/datas/dept.txt'
into table default.dept_partition partition(month='201709')

# 指定查询分区表中的数据 
select * from dept_partition where month = '201709';

# 联合查询
select * from dept_partition where month = '201709'
union
select * from dept_partition where month = '201708';

#增加分区


#删除分区


#创建二级分区表
create table dept_partition2(
    deptno int, 
    dname string, 
    loc string
 ) partitioned by (month string,day string)
 row format delimited fields terminated by '\t';
 
 #分区表需要注意的事情
 HDFS中有数据
 hive中创建了表但是没有数据
 
 #haiv 只是改变了HDFS中数据的表现样式

08DML数据库定义

数据导入

第一种方式向hive导入数据

加载本地文件到hive

-- 加载本地文件到hive
load data local inpath /home/soft/student.txt into student; 
load data local inpath '/home/soft/student.txt' into table default.student;
load data local inpath '本地文件地址' into table 数据库名称.数据库表名称 -- local 表示从linux系统上上传

--- 加载hdfs文件到hive    
hadoop fs -put /home/soft/student.txt  / --上传数据到hdfs中

create table student(id string, name string) row format delimited fields terminated by '\t';

load data inpath '/student.txt' into table student;

------------------------------------------------------------------------------------------------------
hadoop fs -mkdir /student  -- 在hdfs下创建stdent目录  注意目录的权限问题
hadoop fs -put  /home/soft/student.txt /student --上传本地/home/soft/student.txt 文件到 hdfs /student目录下。

create table student6(
    id int,
    name string
)
row format delimited fields terminated by '\t'
stored as textfile
location '/student';


--   加载数据覆盖表中已有数据  overwrite
load data local input /home/soft/student.txt overwrite into table student

第二种方式向hive导入数据

insert 方式进行数据导入

-- 创建分区表
-- partitioned
create table student7(
     id int, name string
) partitioned by (month string)
row format delimited fields terminated by '\t';

insert into table student7 partition(month ='201709') values(100,'张三');
#执行 这个语句会产生启动map
/**
0: jdbc:hive2://hadoop102:10000> insert into table student7 partition(month ='201709') values(100,'张三');
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1649984531147_0001
INFO  : The url to track the job: http://hadoop103:8088/proxy/application_1649984531147_0001/
INFO  : Starting Job = job_1649984531147_0001, Tracking URL = http://hadoop103:8088/proxy/application_1649984531147_0001/
INFO  : Kill Command = /home/soft/hadoop-2.7.2/bin/hadoop job  -kill job_1649984531147_0001
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2022-04-15 02:12:56,326 Stage-1 map = 0%,  reduce = 0%
INFO  : 2022-04-15 02:13:01,466 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 2.11 sec
INFO  : MapReduce Total cumulative CPU time: 2 seconds 110 msec
INFO  : Ended Job = job_1649984531147_0001
INFO  : Stage-4 is selected by condition resolver.
INFO  : Stage-3 is filtered out by condition resolver.
INFO  : Stage-5 is filtered out by condition resolver.
INFO  : Moving data to: hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db/student7/month=201709/.hive-staging_hive_2022-04-15_02-12-49_377_6928154102701663989-1/-ext-10000 from hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db/student7/month=201709/.hive-staging_hive_2022-04-15_02-12-49_377_6928154102701663989-1/-ext-10002
INFO  : Loading data to table db_hive.student7 partition (month=201709) from hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db/student7/month=201709/.hive-staging_hive_2022-04-15_02-12-49_377_6928154102701663989-1/-ext-10000
INFO  : Partition db_hive.student7{month=201709} stats: [numFiles=1, numRows=1, totalSize=7, rawDataSize=6]
No rows affected (13.349 seconds)
*/

# 导入数据 （备份数据）
insert overwrite table student7 partition(month='201708') select id,name from student7 where month='201709';
/**
执行过程
0: jdbc:hive2://hadoop102:10000> insert overwrite table student7 partition(month='201708') select id,name from student7 where month='201709';

INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1649984531147_0003
INFO  : The url to track the job: http://hadoop103:8088/proxy/application_1649984531147_0003/
INFO  : Starting Job = job_1649984531147_0003, Tracking URL = http://hadoop103:8088/proxy/application_1649984531147_0003/
INFO  : Kill Command = /home/soft/hadoop-2.7.2/bin/hadoop job  -kill job_1649984531147_0003
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2022-04-15 02:16:54,995 Stage-1 map = 0%,  reduce = 0%
INFO  : 2022-04-15 02:16:59,092 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.0 sec
INFO  : MapReduce Total cumulative CPU time: 1 seconds 0 msec
INFO  : Ended Job = job_1649984531147_0003
INFO  : Stage-4 is selected by condition resolver.
INFO  : Stage-3 is filtered out by condition resolver.
INFO  : Stage-5 is filtered out by condition resolver.
INFO  : Moving data to: hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db/student7/month=201708/.hive-staging_hive_2022-04-15_02-16-48_516_7282749969946638844-1/-ext-10000 from hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db/student7/month=201708/.hive-staging_hive_2022-04-15_02-16-48_516_7282749969946638844-1/-ext-10002
INFO  : Loading data to table db_hive.student7 partition (month=201708) from hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db/student7/month=201708/.hive-staging_hive_2022-04-15_02-16-48_516_7282749969946638844-1/-ext-10000
INFO  : Partition db_hive.student7{month=201708} stats: [numFiles=1, numRows=1, totalSize=6, rawDataSize=5]
No rows affected (11.8 seconds)
0: jdbc:hive2://hadoop102:10000> 
*/
# 导入数据
from student7
insert overwrite table student7 partition(month='201708') 
select id,name from student7 where month = '201709'
insert overwrite table student7 partition(month='201708') 
select id,name from student7 where month = '201709';

第三种方式向hive导入数据

create table if not exists student8
as select id,name from student;
/**
0: jdbc:hive2://hadoop102:10000> create table if not exists student8
0: jdbc:hive2://hadoop102:10000> as select id,name from student;
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1649984531147_0005
INFO  : The url to track the job: http://hadoop103:8088/proxy/application_1649984531147_0005/
INFO  : Starting Job = job_1649984531147_0005, Tracking URL = http://hadoop103:8088/proxy/application_1649984531147_0005/
INFO  : Kill Command = /home/soft/hadoop-2.7.2/bin/hadoop job  -kill job_1649984531147_0005
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2022-04-15 02:35:27,125 Stage-1 map = 0%,  reduce = 0%
INFO  : 2022-04-15 02:35:31,205 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.89 sec
INFO  : MapReduce Total cumulative CPU time: 890 msec
INFO  : Ended Job = job_1649984531147_0005
INFO  : Stage-4 is selected by condition resolver.
INFO  : Stage-3 is filtered out by condition resolver.
INFO  : Stage-5 is filtered out by condition resolver.
INFO  : Moving data to: hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db/.hive-staging_hive_2022-04-15_02-35-23_664_6886647803983637797-6/-ext-10001 from hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db/.hive-staging_hive_2022-04-15_02-35-23_664_6886647803983637797-6/-ext-10003
INFO  : Moving data to: hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db/student8 from hdfs://hadoop102:9000/user/hive/warehouse/db_hive.db/.hive-staging_hive_2022-04-15_02-35-23_664_6886647803983637797-6/-ext-10001
INFO  : Table db_hive.student8 stats: [numFiles=1, numRows=0, totalSize=0, rawDataSize=0]
No rows affected (8.649 seconds)
*/

第四种方式向hive导入数据

location 加载数据

drop table student;

dfs -mkdir /student;

dfs -put /home/soft/student.txt   /student;

create table student(
    deptno int, 
    dname string
)row format delimited fields terminated by '\t'
location '/student';
-- location '/student' 在hdfs 下有一个student文件夹下有student.txt 文件 
-- http://192.168.175.102:50070/explorer.html# 
-- http://192.168.175.102:50070/explorer.html#/

/*
0: jdbc:hive2://hadoop102:10000> create external table student(
0: jdbc:hive2://hadoop102:10000>     id int, 
0: jdbc:hive2://hadoop102:10000>     name string
0: jdbc:hive2://hadoop102:10000> )row format delimited fields terminated by '\t'
0: jdbc:hive2://hadoop102:10000> location '/student';
No rows affected (0.031 seconds)
*/

第五种向hive导入数据

-- 
import table db_hive.student10 from '/export/student';
/*
0: jdbc:hive2://hadoop102:10000> import table db_hive.student10 from '/export/student';
INFO  : Copying data from hdfs://hadoop102:9000/export/student/data to hdfs://hadoop102:9000/export/student/.hive-staging_hive_2022-04-15_03-47-35_673_2921046789811998505-8/-ext-10000
INFO  : Copying file: hdfs://hadoop102:9000/export/student/data/student.txt
INFO  : Loading data to table db_hive.student10 from hdfs://hadoop102:9000/export/student/.hive-staging_hive_2022-04-15_03-47-35_673_2921046789811998505-8/-ext-10000
No rows affected (0.14 seconds)
*/
/*第二种写法*/
import table db_hive.student10 partition(month='201709') from '/export/student';

数据导出

第一种方式

insert 导出数据

insert overwrite local directory '/home/soft/datas/export/student' 
select * from student;
/**
0: jdbc:hive2://hadoop102:10000> insert overwrite local directory '/home/soft/datas/export/student' 
0: jdbc:hive2://hadoop102:10000> select * from student;
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1649984531147_0006
INFO  : The url to track the job: http://hadoop103:8088/proxy/application_1649984531147_0006/
INFO  : Starting Job = job_1649984531147_0006, Tracking URL = http://hadoop103:8088/proxy/application_1649984531147_0006/
INFO  : Kill Command = /home/soft/hadoop-2.7.2/bin/hadoop job  -kill job_1649984531147_0006
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2022-04-15 02:55:59,020 Stage-1 map = 0%,  reduce = 0%
INFO  : 2022-04-15 02:56:04,123 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 1.21 sec
INFO  : MapReduce Total cumulative CPU time: 1 seconds 210 msec
INFO  : Ended Job = job_1649984531147_0006
INFO  : Copying data to local directory /home/soft/datas/export/student from hdfs://hadoop102:9000/tmp/hive/root/2114f150-2589-499b-bdee-b97d0c22ccd2/hive_2022-04-15_02-55-55_365_1596929057137095706-8/-mr-10000
INFO  : Copying data to local directory /home/soft/datas/export/student from hdfs://hadoop102:9000/tmp/hive/root/2114f150-2589-499b-bdee-b97d0c22ccd2/hive_2022-04-15_02-55-55_365_1596929057137095706-8/-mr-10000
No rows affected (9.809 seconds)
0: jdbc:hive2://hadoop102:10000> 
*/

insert overwrite local directory '/home/soft/datas/export/student1' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'  -- 格式化输出语句
select * from student;
/**
0: jdbc:hive2://hadoop102:10000> insert overwrite local directory '/home/soft/datas/export/student1' 
0: jdbc:hive2://hadoop102:10000> ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
0: jdbc:hive2://hadoop102:10000> select * from student;
INFO  : Number of reduce tasks is set to 0 since there's no reduce operator
INFO  : number of splits:1
INFO  : Submitting tokens for job: job_1649984531147_0007
INFO  : The url to track the job: http://hadoop103:8088/proxy/application_1649984531147_0007/
INFO  : Starting Job = job_1649984531147_0007, Tracking URL = http://hadoop103:8088/proxy/application_1649984531147_0007/
INFO  : Kill Command = /home/soft/hadoop-2.7.2/bin/hadoop job  -kill job_1649984531147_0007
INFO  : Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
INFO  : 2022-04-15 02:59:19,400 Stage-1 map = 0%,  reduce = 0%
INFO  : 2022-04-15 02:59:22,474 Stage-1 map = 100%,  reduce = 0%, Cumulative CPU 0.91 sec
INFO  : MapReduce Total cumulative CPU time: 910 msec
INFO  : Ended Job = job_1649984531147_0007
INFO  : Copying data to local directory /home/soft/datas/export/student1 from hdfs://hadoop102:9000/tmp/hive/root/2114f150-2589-499b-bdee-b97d0c22ccd2/hive_2022-04-15_02-59-14_993_5708696570995345258-8/-mr-10000
INFO  : Copying data to local directory /home/soft/datas/export/student1 from hdfs://hadoop102:9000/tmp/hive/root/2114f150-2589-499b-bdee-b97d0c22ccd2/hive_2022-04-15_02-59-14_993_5708696570995345258-8/-mr-10000
No rows affected (9.552 seconds)
0: jdbc:hive2://hadoop102:10000> 
*/

insert overwrite directory '/studentexport' 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'  -- 格式化输出语句
select * from student;

第二种导出方式

dfs -get /下载命令

第三种导出方式

-- bin/hive -e '执行的sql' > '路径地址'
bin/hive -e 'select * from db_hive.student;' > /home/soft/datas/export/stu.txt
/*
[root@hadoop102 hive]# bin/hive -e 'select * from db_hive.student;' > /home/soft/datas/export/stu.txt

Logging initialized using configuration in file:/home/soft/hive/conf/hive-log4j.properties
OK
Time taken: 0.884 seconds, Fetched: 16 row(s)
[root@hadoop102 hive]# 
*/

第四种方式导入数据

export table db_hive.student to '/export/student';

一个表构成的组合 数据 和 元数据（_metadata ）
	
/*
0: jdbc:hive2://hadoop102:10000> export table db_hive.student to '/export/student';
INFO  : Copying data from file:/tmp/root/2114f150-2589-499b-bdee-b97d0c22ccd2/hive_2022-04-15_03-44-16_463_7703447412327772980-8/-local-10000/_metadata to hdfs://hadoop102:9000/export/student
INFO  : Copying file: file:/tmp/root/2114f150-2589-499b-bdee-b97d0c22ccd2/hive_2022-04-15_03-44-16_463_7703447412327772980-8/-local-10000/_metadata
INFO  : Copying data from hdfs://hadoop102:9000/student to hdfs://hadoop102:9000/export/student/data
INFO  : Copying file: hdfs://hadoop102:9000/student/student.txt
No rows affected (0.15 seconds)
0: jdbc:hive2://hadoop102:10000> 
*/

清除表中的数据

truncate table student;

两者的特点

  HBase通过存储key/value来工作。它支持四种主要的操作：增加或者更新行，查看一个范围内的cell，获取指定的行，删除指定的行、列或者是列的版本。版本信息用来获取历史数据(每一行的历史数据可以被删除，然后通过Hbase compactions就可以释放出空间)。虽然HBase包括表格，但是schema仅仅被表格和列簇所要求，列不需要schema。Hbase的表格包括增加/计数功能。

限制

应用场景

Hbase非常适合用来进行大数据的实时查询。Facebook用Hbase进行消息和实时的分析。它也可以用来统计Facebook的连接数。

总结

hive查询

基本查询

create table if not exists dept(
deptno int,
dname string,
loc int
)
row format delimited fields terminated by '\t';

create table if not exists emp(
empno int,
ename string,
job string,
mgr int,
hiredate string, 
sal double, 
comm double,
deptno int)
row format delimited fields terminated by '\t';

#
load data local inpath '/home/soft/dept.txt' into table dept;

#
load data local inpath '/home/soft/emp.txt' into table emp;

/*
0: jdbc:hive2://hadoop102:10000> load data local inpath '/home/soft/dept.txt' into table dept;
INFO  : Loading data to table db_hive.dept from file:/home/soft/dept.txt
INFO  : Table db_hive.dept stats: [numFiles=1, totalSize=71]
No rows affected (0.151 seconds)
0: jdbc:hive2://hadoop102:10000> load data local inpath '/home/soft/emp.txt' into table emp;
INFO  : Loading data to table db_hive.emp from file:/home/soft/emp.txt
INFO  : Table db_hive.emp stats: [numFiles=1, totalSize=656]
No rows affected (0.123 seconds)
0: jdbc:hive2://hadoop102:10000> 
*/

select * from emp;

select id from emp;

select e.ename from emp e;

select e.ename nn from emp e;

select sal+1 from emp;

posted @ 2024-10-31 14:09 Sunray0330 阅读(8) 评论(0) 编辑收藏举报

刷新页面返回顶部

Sunray

013_Hive

01:Hive基本概念

02Hive安装准备

03将本地文件导入到hive中

04数据库的基本使用

配置hive元数据到mysql

05一些基本命令操作

查看本地系统中的内容。

查看hive 历史运行的命令

显示当前数据库，以及查询表的头信息配置

查看hive所有的配置信息

修改hive中的默认日志配置

配置的优先级

06数据类型

07DDL数据定义

创建数据库

修改数据库

删除数据库

创建表

内部表管理表

外部表

分区表

08DML数据库定义

数据导入

数据导出

清除表中的数据

01:Hive基本概念

02Hive安装准备

03将本地文件导入到hive中

04数据库的基本使用

安装MySql

使用MySQL中user表中主机配置

hive元数据配置到Mysql

配置hive元数据到MySQL（重要）

HiveJDBC访问

05一些基本命令操作

查看本地系统中的内容。

查看hive 历史运行的命令

显示当前数据库，以及查询表的头信息配置

查看hive所有的配置信息

修改hive中的默认日志配置

配置的优先级

06数据类型

07DDL数据定义

创建数据库

修改数据库

删除数据库

创建表

内部表管理表

管理表和内部表转换

分区表

08DML数据库定义

数据导入

第一种方式向hive导入数据

第二种方式向hive导入数据

第三种方式向hive导入数据

第四种方式向hive导入数据

第五种向hive导入数据

数据导出

第一种方式

第二种导出方式

第三种导出方式

第四种方式导入数据

清除表中的数据

hive查询

公告