sqoop中作业的使用

sqoop中,可以将导入导出任务,写到job中,实现创建、查看、执行和删除job的功能。

数据准备

mysql先准备数据,创建sqooptest数据库,并添加表Man和数据,如下图。

创建作业

可以通过'sqoop job --help'命令查看具体的使用方法。

[hadoop@node01 ~/.sqoop]$ sqoop job --help
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/02/06 16:54:13 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
# 使用语法 usage: sqoop job [GENERIC-ARGS] [JOB-ARGS] [-- [<tool-name>] [TOOL-ARGS]] Job management arguments: --create <job-id> Create a new saved job --delete <job-id> Delete a saved job --exec <job-id> Run a saved job --help Print usage instructions --list List saved jobs --meta-connect <jdbc-uri> Specify JDBC connect string for the metastore --show <job-id> Show the parameters for a saved job --verbose Print more information while working Generic Hadoop command-line arguments: (must preceed any tool-specific arguments) Generic options supported are -conf <configuration file> specify an application configuration file -D <property=value> use value for given property -fs <local|namenode:port> specify a namenode -jt <local|resourcemanager:port> specify a ResourceManager -files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster -libjars <comma separated list of jars> specify comma separated jar files to include in the classpath. -archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.

使用上面的提示,先查看一下目前的job,发现还没有。

# 查看创建的作业
[hadoop@node01 ~/.sqoop]$ sqoop job --list
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/02/06 17:03:16 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
# 暂时没有
Available jobs:
You have new mail in /var/spool/mail/root

那就创建一个,使用如下命令。

# 创建job
[hadoop@node01 ~/.sqoop]$ sqoop job \
> --create testjob \
> -- import \
> --connect jdbc:mysql://node01:3306/sqooptest \
> --username root \
> --password 123456 \
> --table Man \
> --target-dir /sqoop/testjob \
> --delete-target-dir \
> --m 1
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/02/06 17:11:32 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
20/02/06 17:11:32 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
You have new mail in /var/spool/mail/root
# 查看job
[hadoop@node01 ~/.sqoop]$ sqoop job --list
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/02/06 17:11:48 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
# 创建成功
Available jobs:
  testjob

查看作业

使用如下命令来查看刚才的job,可以查看job的详细信息。

# 查看testjob详细信息
[hadoop@node01 ~/.sqoop]$ sqoop job --show testjob
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/02/06 17:14:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
# 输入密码
Enter password:
Job: testjob
Tool: import
Options:
----------------------------
reset.onemapper = false
codegen.output.delimiters.enclose = 0
sqlconnection.metadata.transaction.isolation.level = 2
codegen.input.delimiters.escape = 0
codegen.auto.compile.dir = true
accumulo.batch.size = 10240000
codegen.input.delimiters.field = 0
accumulo.create.table = false
mainframe.input.dataset.type = p
enable.compression = false
accumulo.max.latency = 5000
db.username = root
sqoop.throwOnError = false
db.clear.staging.table = false
codegen.input.delimiters.enclose = 0
hdfs.append.dir = false
import.direct.split.size = 0
hcatalog.drop.and.create.table = false
codegen.output.delimiters.record = 10
codegen.output.delimiters.field = 44
hdfs.target.dir = /sqoop/testjob
hbase.bulk.load.enabled = false
# maptask数为1
mapreduce.num.mappers = 1
export.new.update = UpdateOnly
db.require.password = true
hive.import = false
customtool.options.jsonmap = {}
hdfs.delete-target.dir = true
codegen.output.delimiters.enclose.required = false
direct.import = false
codegen.output.dir = .
hdfs.file.format = TextFile
hive.drop.delims = false
codegen.input.delimiters.record = 0
db.batch = false
split.limit = null
hcatalog.create.table = false
hive.fail.table.exists = false
hive.overwrite.table = false
incremental.mode = None
temporary.dirRoot = _sqoop
verbose = false
import.max.inline.lob.size = 16777216
import.fetch.size = null
codegen.input.delimiters.enclose.required = false
relaxed.isolation = false
sqoop.oracle.escaping.disabled = true
# mysql表名
db.table = Man
hbase.create.table = false
codegen.compile.dir = /tmp/sqoop-hadoop/compile/b36217ccf86b7984cc9537d327e9598f
codegen.output.delimiters.escape = 0
# 连接mysql的字符
db.connect.string = jdbc:mysql://node01:3306/sqooptest

执行作业

执行刚才创建的作业,使用如下命令执行,期间需要输入密码。

[hadoop@node01 ~/.sqoop]$ sqoop job --exec testjob
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/02/06 17:22:10 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
# 输入密码
Enter password:
...省略
20/02/06 17:22:36 INFO mapreduce.ImportJobBase: Transferred 177.6777 KB in 14.6772 seconds (12.1057 KB/sec)
# 提示保存了6条数据
20/02/06 17:22:36 INFO mapreduce.ImportJobBase: Retrieved 6 records.
You have new mail in /var/spool/mail/root

执行完后,查看hdfs,发现成功执行了job。

删除作业

可以使用如下命令删除刚才作业。

# 删除job
[hadoop@node01 ~/.sqoop]$ sqoop job --delete testjob
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/02/06 17:33:54 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
You have new mail in /var/spool/mail/root
# 查看发现已经删除
[hadoop@node01 ~/.sqoop]$ sqoop job --list
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/02/06 17:34:04 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
# 已删除
Available jobs:

sqoop增量作业

sqoop可以创建增量作业,依然使用刚才的表来导入, 使用模式为append。

# 创建一个增量job
[hadoop@node01 ~/.sqoop]$ sqoop job --create incrementjob -- import --connect jdbc:mysql://node01:3306/sqooptest --username root --password 123456 --table Man --target-dir /sqoop/incrementjob --incremental append --check-column time --last-value '2020-01-25 11:15:00' --m 1
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../hcatalog does not exist! HCatalog jobs will fail.
Please set $HCAT_HOME to the root of your HCatalog installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../accumulo does not exist! Accumulo imports will fail.
Please set $ACCUMULO_HOME to the root of your Accumulo installation.
Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../zookeeper does not exist! Accumulo imports will fail.
Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.
20/02/06 18:03:31 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2
20/02/06 18:03:31 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
# 查看新创建的job
[hadoop@node01
~/.sqoop]$ sqoop job --list Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../hcatalog does not exist! HCatalog jobs will fail. Please set $HCAT_HOME to the root of your HCatalog installation. Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root of your Accumulo installation. Warning: /kkb/install/sqoop-1.4.6-cdh5.14.2/../zookeeper does not exist! Accumulo imports will fail. Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation. 20/02/06 18:03:43 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6-cdh5.14.2 Available jobs: incrementjob

先执行,查看hdfs中数据情况。注意执行时又提示输入密码,可以通过修改sqoop根目录/conf下的sqoop-site.xml文件来完成修改。

# 保存密码到sqoop元数据
<
property> <name>sqoop.metastore.client.record.password</name> <value>true</value> <description>If true, allow saved passwords in the metastore. </description> </property>

执行job后,hdfs中正常保存增量数据。

此时,如果把mysql原始数据修改下,增加一条id=7的数据,再次执行刚才的job,能否导入到hdfs呢?

再次执行job,查看hdfs中情况。

说明可以导入增量数据,但是job创建时,检查列time的last-value是指定的,为什么还能导入数据,这是因为sqoop的元数据中更新了last-value的值,如下所示。

# sqoop安装目录平级目录下,有一个.sqoop隐藏目录
[hadoop@node01 ~/.sqoop]$ ll
total 28
-rw-rw-r-- 1 hadoop hadoop 14802 Feb  6 17:22 Man.java
-rw-rw-r-- 1 hadoop hadoop   419 Feb  6 19:32 metastore.db.properties
-rw-rw-r-- 1 hadoop hadoop  6662 Feb  6 19:32 metastore.db.script
You have new mail in /var/spool/mail/root
# 查看元数据内容
[hadoop@node01 ~/.sqoop]$ more metastore.db.script
CREATE SCHEMA PUBLIC AUTHORIZATION DBA
CREATE MEMORY TABLE SQOOP_ROOT(VERSION INTEGER,PROPNAME VARCHAR(128) NOT NULL,PROPVAL VARCHAR(256),CONSTRAINT
SQOOP_ROOT_UNQ UNIQUE(VERSION,PROPNAME))
CREATE MEMORY TABLE SQOOP_SESSIONS(JOB_NAME VARCHAR(64) NOT NULL,PROPNAME VARCHAR(128) NOT NULL,PROPVAL VARCHA
R(1024),PROPCLASS VARCHAR(32) NOT NULL,CONSTRAINT SQOOP_SESSIONS_UNQ UNIQUE(JOB_NAME,PROPNAME,PROPCLASS))
CREATE USER SA PASSWORD ""
GRANT DBA TO SA
SET WRITE_DELAY 10
SET SCHEMA PUBLIC
INSERT INTO SQOOP_ROOT VALUES(NULL,'sqoop.hsqldb.job.storage.version','0')
INSERT INTO SQOOP_ROOT VALUES(0,'sqoop.hsqldb.job.info.table','SQOOP_SESSIONS')
INSERT INTO SQOOP_SESSIONS VALUES('incrementjob','sqoop.tool','import','schema')
INSERT INTO SQOOP_SESSIONS VALUES('incrementjob','sqoop.property.set.id','0','schema')
INSERT INTO SQOOP_SESSIONS VALUES('incrementjob','verbose','false','SqoopOptions')
INSERT INTO SQOOP_SESSIONS VALUES('incrementjob','hcatalog.drop.and.create.table','false','SqoopOptions')
# 更新了last.value的值为刚插入的数据时间戳
INSERT INTO SQOOP_SESSIONS VALUES('incrementjob','incremental.last.value','2020-02-06 19:29:39.0','SqoopOption
s')

以上,是对sqoop作业使用的记录。 

参考博文:

(1) https://www.cnblogs.com/youngchaolin/p/12253859.html

posted @ 2020-02-06 19:45  斐波那切  阅读(477)  评论(0编辑  收藏  举报