Sqoop_mysql,hive,hdfs导入导出操作
前言: 搭建环境,这里使用cdh版hadoop+hive+sqoop+mysql
下载 hadoop-2.5.0-cdh5.3.6.tar.gz hive-0.13.1-cdh5.3.6.tar.gz sqoop-1.4.5-cdh5.3.6.tar.gz 配置 Hadoop *.env(3个)--jdk_Path core-sit.xml fs.defaultFS hadoop.tmp.dir hdfs-site.xml dfs.replication mapred-site.xml
mapreduce.framework.name--yarn mapreduce.jobhistory.address # 10020 mapreduce.jobhistory.webapp.address # 19888 yarn-site.xml yarn.resourcemanager.hostname yarn.nodemanager.aux-services--mapreduce_shuffle yarn.log-aggregation-enable--true yarn.log-aggregation.retain-seconds--108600 slave 主机地址 PS: 格式化namenode,启动hdfs与yarn $ bin/hdfs dfs -mkdir /tmp $ bin/hdfs dfs -mkdir -p /user/hive/warehouse $ bin/hdfs dfs -chmod g+w /tmp $ bin/hdfs dfs -chmod g+w /user/hive/warehouse 配置Hive hive-env.sh HADOOP_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6 export HIVE_CONF_DIR=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/conf hive-log4j.properties hive.log.threshold=ALL hive.root.logger=INFO,DRFA hive.log.dir=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6/logs hive.log.file=hive.log hive-site.xml # 事先将mysql部署好 javax.jdo.option.ConnectionURL--jdbc:mysql://hadoop09-linux-01.ibeifeng.com:3306/chd_metastore?createDatabaseIfNotExist=true javax.jdo.option.ConnectionDriverName--com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserNam javax.jdo.option.ConnectionPassword hive.cli.print.header--true hive.cli.print.current.db--true hive.fetch.task.conversion--more PS: hive目录下 $ mkdir logs 将准备好的mysql.jar包放入lib 启动 $ bin/hive 配置Sqoop sqoop-env.sh export HADOOP_COMMON_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6 export HADOOP_MAPRED_HOME=/opt/cdh-5.6.3/hadoop-2.5.0-cdh5.3.6 export HIVE_HOME=/opt/cdh-5.6.3/hive-0.13.1-cdh5.3.6 将准备好的mysql.jar包放入lib
一、准备数据
# 在我的mysql下创建数据库和表,并插入几条数据 mysql> create database if not exists student default character set utf8 collate utf8_general_ci; mysql> use student; mysql> create table if not exists stu_info( id int(10) primary key not null auto_increment, name varchar(20) not null) default character set utf8 collate utf8_general_ci; mysql> insert into stu_info(name) values("李建"); mysql> insert into stu_info(name) values("张明"); mysql> insert into stu_info(name) values("赵兴"); mysql> insert into stu_info(name) values("陈琦"); mysql> insert into stu_info(name) values("刘铭"); mysql> select id,name from stu_info; +----+--------+ | id | name | +----+--------+ | 1 | 李建 | | 2 | 张明 | | 3 | 赵兴 | | 4 | 陈琦 | | 5 | 刘铭 | +----+--------+ 5 rows in set (0.00 sec)
二、使用sqoop将mysql中的这张表导入到hdfs上
bin/sqoop import \ --connect \ jdbc:mysql://10.0.0.108:3306/student \ --username root \ --password root \ --table stu_info \ --target-dir /student \ --num-mappers 1 \ --fields-terminated-by '\t'
三、使用sqoop将mysql中的这张表导入到hive
方式一、 1. 在hive中创建数据库和表 create database if not exists student; create table if not exists stu_info(id int,name string) row format delimited fields terminated by '\t'; 2. bin/sqoop import \ --connect jdbc:mysql://hadoop09-linux-01.ibeifeng.com:3306/student \ --username root --password root \ --table stu_info \ --delete-target-dir \ --target-dir /user/hive/warehouse/student.db/stu_info \ --hive-import \ --hive-database student \ --hive-table stu_info \ --hive-overwrite \ --num-mappers 1 \ --fields-terminated-by '\t' 方式二、 1. 使用sqoop create-hive-table,但必须创建出自定义数据库,否则目标路径将是元数据库 2. bin/sqoop create-hive-table 、 --connect jdbc:mysql://10.0.0.108:3306/student 、 --username root --password root \ --table stu_info \ --hive-table student.stu_info 3. bin/sqoop import --connect jdbc:mysql://10.0.0.108:3306/student \ --username root --password root \ --table stu_info \ --hive-import \ --hive-database student \ --hive-table stu_info \ --hive-overwrite \ --num-mappers 1 \ --fields-terminated-by '\t' \ --delete-target-dir \ --target-dir /user/hive/warehouse/student.db/stu_info 4. 在hive中查询会发现数据全部为NULL 但是从hdfs上查看却是正常的,确定hive无法解析数据,定位在分隔符问题 使用--fields-terminated-by '\001' 即可 # \001就是ctrl+A,hive默认分隔符,mysql默认分隔符为","
五、从hdfs或hive导出数据到mysql表
1. 在mysql上准备好数据库和表 2. 数据库我就直接使用student数据库 create table if not exists stu_info_export like stu_info; 3. 根据hdfs/hive表数据分隔符为主 bin/sqoop export \ --connect jdbc:mysql://10.0.0.108/student \ --username root --password root \ --table stu_info_export \ --export-dir /user/hive/warehouse/student.db/stu_info \ --num-mappers 1 \ --input-fields-terminated-by '\001'
六、sqoop --option-file
另外 企业级增量迁移数据使用 --option-file + shell脚本 -- $ sqoop import --connect jdbc:mysql://localhost/db --username foo --table TEST -- $ sqoop --options-file /users/homer/work/import.txt --table TEST 注意:脚本格式开头直接导入导出命令然后一行一个属性,如: -->import --connect jdbc:mysql://localhost/db --username foo
七、使用sqoop job
$ bin/sqoop job --delete <job_id> $ bin/sqoop job --list $ bin/sqoop job --show <job_id> $ bin/sqoop job --exec <job_id> $ bin/sqoop job --create job_id -- <job-info> $ bin/sqoop job --create stu_info -- \ import \ --connect \ jdbc:mysql://hadoop09-linux-01.ibeifeng.com:3306/sqoop \ --username root \ --password root \ --table tohdfs \ --target-dir /sqoop \ --num-mappers 1 \ --fields-terminated-by '\t' \ --check-column id \ --incremental append \ --last-value 11 PS: 增量导入(与--delete-target-dir冲突) --check-column id --incremental append/lastmodified(时间戳的更改) --last-value 11 另外: --columns field1,field2,field3 --query <ql> # 需要加 $CONDITIONS,且不能和--table连用 --where <where xxx> # 无需加$CONDITIONS