sqoop使用详解

import-to-hdfs 关系数据库导出数据到hdfs&hive

sqoop用于关系数据库和hadoop家族(hdfs、hive、hbase)之间的ETL

数据库导出到hadoop家族:sqoop import hadoop家族导出到数据库:sqoop export

下载:
http://www.apache.org/dyn/closer.lua/sqoop/1.4.7

官方:
https://sqoop.apache.org/docs/1.4.7/SqoopUserGuide.html

参考文档: https://www.cnblogs.com/yinzhengjie/p/9183069.html

sqoop 1.4.7 安装

cd /usr/app

tar -zxvf sqoop-1.4.7.binhadoop-2.6.0.tar.gz mv sqoop-1.4.7.binhadoop-2.6.0/ sqoop-1.4.7

cd sqoop-1.4.7/conf/ mv sqoop-env-template.sh sqoop-env.sh vi sqoop-env.sh

export HADOOPCOMMONHOME=/usr/local/hadoop/hadoop-2.10.0

export HADOOPMAPREDHOME=/usr/local/hadoop/hadoop-2.10.0

export HBASE_HOME=/usr/app/hbase-1.6.0

export HIVE_HOME=/usr/app/apache-hive-2.3.7-bin

export ZOOCFGDIR=/usr/app/apache-zookeeper-3.6.1-bin/conf

cp mysql-connector-java-5.1.49-bin.jar /usr/app/sqoop-1.4.7/lib/

vi /etc/profile

export SQOOPHOME=/usr/app/sqoop-1.4.7 export PATH=$PATH:$SQOOPHOME/bin

source /etc/profile

验证是否成功 sqoop-version

mysql需要配置root@hadoop001的权限 GRANT ALL PRIVILEGES ON . TO 'root'@'%' IDENTIFIED BY 'mysql'

如果是集群环境试试关防火墙 sqoop import --connect jdbc:mysql://192.168.10.3:3306/mydb --username root --password mysql --table employees -m 2

看看效果(默认路径)

hadoop fs -ls /user/root/employees hadoop fs -cat /user/root/employees/part-m-00000

命令

  • codegen Generate code to interact with database records
  • create-hive-table Import a table definition into Hive
  • eval Evaluate a SQL statement and display the results
  • export Export an HDFS directory to a database table
  • help List available commands
  • import Import a table from a database to HDFS
  • import-all-tables Import tables from a database to HDFS
  • import-mainframe Import datasets from a mainframe server to HDFS
  • job Work with saved jobs
  • list-databases List available databases on a server
  • list-tables List available tables in a database
  • merge Merge results of incremental imports
  • metastore Run a standalone Sqoop metastore
  • version Display version information

参数

  • --connect <jdbc-uri> Specify JDBC connect string
  • --connection-manager <class-name> Specify connection manager class to use
  • --driver <class-name> Manually specify JDBC driver class to use
  • --hadoop-mapred-homeOverride $HADOOPMAPREDHOME
  • --help Print usage instructions
  • --password-file Set path for a file containing the authentication password
  • -P Read password from console
  • --password Set authentication password
  • --username Set authentication username
  • --verbose Print more information while working
  • --connection-param-file Optional properties file that provides connection parameters
  • --relaxed-isolation Set connection transaction isolation to read uncommitted for the mappers.

数据库连接

  • mysql

sqoop import --connect jdbc:mysql://192.168.10.3:3306/mydb --username root --password mysql

实现需要驱动

cp mysql-connector-java-5.1.40-bin.jar apps/sqoop-1.4.6/lib/

数据表列表

sqoop list-tables --connect jdbc:mysql://192.168.10.3:3306/mydb --username root --password mysql

数据表单表导出到hdfs

导入的默认路径:/user/hadoop/tablename1

sqoop import --connect jdbc:mysql://192.168.10.3:3306/mydb?useSSL=false --username root --password mysql --table employees --target-dir /user/hadoop11/employees -m 1

指定分隔符、split-by分区、map并发

sqoop import \
--connect jdbc:mysql://hadoop1:3306/mysql \
--username root \
--password root \
--table helpkeyword \
--target-dir /user/hadoop11/my
help_keyword1 \
--fields-terminated-by '\t' \
-m 2

--fields-terminated-by '\t' 用于分隔符
-m:表明需要使用几个map任务并发执行

sqoop import \
--query 'SELECT a., b. FROM a JOIN b on (a.id == b.id) WHERE $CONDITIONS' \
--split-by a.id -m 10 --target-dir /user/foo/joinresults

--split-by 用于分区处理 
假设有一张表test,sqoop命令中--split-by 'id',-m 10,会发生怎样奇特的事情。首先呢,sqoop会去查表的元数据等等,重点说一下sqoop是如何根据--split-by进行分区的。首先sqoop会向关系型数据库比如mysql发送一个命令:select max(id),min(id) from test。然后会把max、min之间的区间平均分为10分,最后10个并行的map去找数据库

增量导入

id增量

sqoop import \
--connect jdbc:mysql://hadoop1:3306/mysql \
--username root \
--password root \
--table helpkeyword \
--target-dir /user/hadoop/myimport
add \
--incremental append \
--check-column helpkeywordid \
--last-value 500 \
-m 1

时间增量

sqoop import \ --connect jdbc:mysql://hadoop1:3316/testdb \ --username root \ --password transwarp \ --query “select orderid, name from ordertable where \$CONDITIONS” \ --target-dir /user/root/orderall \ --split-by id \ -m 4 \ --incremental lastmodified \ --merge-key orderid \ --check-column time \ --last-value “2014-11-09 21:00:00”

导入到Hive中

sqoop import --connect jdbc:mysql://192.168.10.3:3306/mydb?useSSL=false --username root --password mysql --table employees --target-dir /user/sqoop/employees --hive-import -m 2

创建在默认的default库中.看看效果(默认路径
hadoop fs -cat /user/sqoop/employees//part-m-00000

Hive 指定行分隔符和列分隔符,指定hive-import,指定覆盖导入,指定自动创建hive表,指定表名,指定删除中间结果数据目录

先创建一个hive库 hive> create database mydb;

sqoop import \
--connect jdbc:mysql://192.168.10.3:3306/mydb?useSSL=false \
--username root \
--password mysql \
--table employees \
--fields-terminated-by "\t" \
--lines-terminated-by "\n" \
--hive-import \
--hive-overwrite \
--create-hive-table \
--delete-target-dir \
--hive-database mydb \
--hive-table employees

查看效果 
use mydb;
select * from employees limit 20;

  • --create-hive-table //改参数表示如果表不存在就创建,若存在就忽略该参数
  • --external-table-dir //指定外部表路径
  • --hive-database <database-name> //指定hive的数据库
  • --hive-import //指定导入hive表
  • --hive-partition-key <partition-key> //指定分区的key
  • --hive-partition-value <partition-value> //指定分区的value
  • --hive-table <table-name> //指定hive的表

HDFS导出 > MySQL

sqoop export --connect jdbc:mysql://192.168.1.59:3307/bigdata --username root --password mysql --export-dir /sqoop --table student1

HIVE导出 > MySQL

sqoop export --connect jdbc:mysql://192.168.1.59:3307/bigdata --username root --password mysql --table employee1 --export-dir /user/hive/warehouse/sqoop.db/employee -input-fields-terminated-by '\001' --m 3

java代码生成

sqoop codegen --connect jdbc:mysql://192.168.1.59:3307/bigdata --username root --password mysql --table employee1

posted @ 2020-11-25 10:06  昕友软件开发  阅读(440)  评论(0编辑  收藏  举报
欢迎访问我的开源项目:xyIM企业即时通讯