HIVE分布式集群安装过程（hadoop2.4.0+hive0.12）及.csv文件数据导入测试 - zzhaoh

公告

HIVE分布式集群安装过程（hadoop2.4.0+hive0.12）及.csv文件数据导入测试

软件版本

Hadoop版本：2.4.0

Hive版本：0.12.0

mysql版本: 5.1.73

1) 在mysql里创建hive用户，并赋予其足够权限

[root@master mysql]# mysql -u root -p

mysql> create user 'hive' identified by 'hive';

mysql> grant all privileges on *.* to 'hive' with grant option;

mysql> flush privileges;

2）在mysql创建hive数据库

[root@master mysql]# mysql -u hive -p

mysql> create database hive;

mysql> use hive;

mysql> show tables;

3）解压hive安装包

tar -xzvf hive-0.12.0-bin.tar.gz

[hadoop@master~]$ mv hive-0.12.0 hive

[hadoop@master~]$ cd hive

[hadoop@master~hive]$ ls

4）下载连接

mysql

驱动：mysql-connector-java-5.1.24-bin.jar 并拷入hive home的lib下

[hadoop@master~]$ mv mysql-connector-java-5.1.24-bin.jar ./hive-0.12.0/lib

5）修改环境变量，加入Hive到PATH

vim /etc/profile

export HIVE_HOME=/home/hadoop/hive

export PATH=$PATH:$HIVE_HOME/bin

6）修改hive-env.sh，设置hadoop路径

[hadoop@master conf]$ cp hive-env.sh.template hive-env.sh

[hadoop@master conf]$ vim hive-env.sh

HADOOP_HOME=/home/hadoop/hadoop-2.4.0

7）拷贝hive-default.xml 并命名为 hive-site.xml，修改其中mysql的配置

[hadoop@master conf]$ cp hive-default.xml.template hive-site.xml

[hadoop@master conf]$ vim hive-site.xml

（文件较长，命令模式下用“/字符”命令搜索）

<name>javax.jdo.option.ConnectionURL</name>

<value>jdbc:mysql://localhost:3306/hive?createDatabaseIfNotExist=true</value>

<description>JDBC connect string for a JDBC metastore</description>

</property>

<name>javax.jdo.option.ConnectionDriverName</name>

<value>com.mysql.jdbc.Driver</value>

<description>Driver class name for a JDBC metastore</description>

</property>

<name>javax.jdo.option.ConnectionUserName</name>

<description>username to use against metastore database</description>

</property>

<name>javax.jdo.option.ConnectionPassword</name>

<description>password to use against metastore database</description>

</property>

8）启动Hadoop，打开hive shell 测试

先启动hadoop集群，然后hive的bin目录执行命令，进入hive控制台

[hadoop@master bin]$ ./hive

创建表1-测试

create table test(id int,name string);

查看数据表

show tables；

增加一列

alter table test add columns(remark string comment 'some remark');

加载数据

load data local inpath '/home/hadoop/file/test.csv' overwrite into table test;

查看表，很多语法和SQL一样

select * from test;

在hadoop中查看文件（根据设置目录不同）

hadoop fs -ls /home/hadoop/hive/warehouse/test

创建表2-指定数据格式

create table user_info (id int, name string, age string)

row format delimited

fields terminated by '\t'

lines terminated by '\n';

导入数据表的数据格式是：字段之间是tab键分割，行之间是换行。

即文件内容格式如下（user_info.txt）：

1001　　jack　　30
1002　　tom　　25
1003　　kate　　20

（ps：我发现把这个文本直接复制到linux上文件保存后\t变空格了，导致后面导入hive后查看不到内容，所以复制的话最好检查下）

导入文件

load data local inpath '/home/hadoop/file/user_info.txt' overwrite into table user_info;

创建表3-CSV格式

create table product (id string, name string, remark string)

ROW FORMAT SERDE 'org.apache.hadoop.hive.contrib.serde2.RegexSerDe'

WITH SERDEPROPERTIES ('input.regex' = '\"(.*)\",\"(.*)\",\"(.*)\"','output.format.string' = '%1$s\\001%2$s\\001%3$s')

STORED AS TEXTFILE;

文件内容格式(prd.csv每个字段都必须有双引号和逗号)：

"1001","iWatch","new"
"1002","iPhone6 Plus","128G"
"1003","Macbook Air","test"

导入文件

load data local inpath '/home/hadoop/file/prd.csv' overwrite into table product ;

《完》

posted on 2015-04-06 12:04 zzhaoh 阅读(410) 评论(0) 编辑收藏举报

刷新页面返回顶部

zhaohz

公告