Sqoop参数
Import参数;
Table 1. Common arguments
Argument | Description |
---|---|
--connect <jdbc-uri> |
Specify JDBC connect string |
--connection-manager <class-name> |
Specify connection manager class to use |
--driver <class-name> |
Manually specify JDBC driver class to use |
--hadoop-mapred-home <dir> |
Override $HADOOP_MAPRED_HOME |
--help |
Print usage instructions |
--password-file |
Set path for a file containing the authentication password |
-P |
Read password from console |
--password <password> |
Set authentication password |
--username <username> |
Set authentication username |
--verbose |
Print more information while working |
--connection-param-file <filename> |
Optional properties file that provides connection parameters |
--relaxed-isolation |
Set connection transaction isolation to read uncommitted for the mappers. |
Argument | Description |
---|---|
--validate |
Enable validation of data copied, supports single table copy only. |
--validator <class-name> |
Specify validator class to use. |
--validation-threshold <class-name> |
Specify validation threshold class to use. |
--validation-failurehandler <class-name> |
Specify validation failure handler class to use. |
Table 3. Import control arguments:
Argument | Description |
---|---|
--append |
Append data to an existing dataset in HDFS |
--as-avrodatafile |
Imports data to Avro Data Files |
--as-sequencefile |
Imports data to SequenceFiles |
--as-textfile |
Imports data as plain text (default) |
--as-parquetfile |
Imports data to Parquet Files |
--boundary-query <statement> |
Boundary query to use for creating splits |
--columns <col,col,col…> |
Columns to import from table |
--delete-target-dir |
Delete the import target directory if it exists |
--direct |
Use direct connector if exists for the database |
--fetch-size <n> |
Number of entries to read from database at once. |
--inline-lob-limit <n> |
Set the maximum size for an inline LOB |
-m,--num-mappers <n> |
Use n map tasks to import in parallel |
-e,--query <statement> |
Import the results of statement . |
--split-by <column-name> |
Column of the table used to split work units. Cannot be used with --autoreset-to-one-mapper option. |
--autoreset-to-one-mapper |
Import should use one mapper if a table has no primary key and no split-by column is provided. Cannot be used with --split-by <col> option. |
--table <table-name> |
Table to read |
--target-dir <dir> |
HDFS destination dir |
--warehouse-dir <dir> |
HDFS parent for table destination |
--where <where clause> |
WHERE clause to use during import |
-z,--compress |
Enable compression |
--compression-codec <c> |
Use Hadoop codec (default gzip) |
--null-string <null-string> |
The string to be written for a null value for string columns |
--null-non-string <null-string> |
The string to be written for a null value for non-string columns |
The --null-string
and --null-non-string
arguments are optional. If not specified, then the string "null" will be used.
Table 4. Parameters for overriding mapping
Argument | Description |
---|---|
--map-column-java <mapping> |
Override mapping from SQL to Java type for configured columns. |
--map-column-hive <mapping> |
Override mapping from SQL to Hive type for configured columns. |
Table 5. Incremental import arguments:
Argument | Description |
---|---|
--check-column (col) |
Specifies the column to be examined when determining which rows to import. (the column should not be of type CHAR/NCHAR/VARCHAR/VARNCHAR/ LONGVARCHAR/LONGNVARCHAR) |
--incremental (mode) |
Specifies how Sqoop determines which rows are new. Legal values for mode include append and lastmodified . |
--last-value (value) |
Specifies the maximum value of the check column from the previous import. |
Table 6. Output line formatting arguments:
Argument | Description |
---|---|
--enclosed-by <char> |
Sets a required field enclosing character |
--escaped-by <char> |
Sets the escape character |
--fields-terminated-by <char> |
Sets the field separator character |
--lines-terminated-by <char> |
Sets the end-of-line character |
--mysql-delimiters |
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ' |
--optionally-enclosed-by <char> |
Sets a field enclosing character |
Table 7. Input parsing arguments:
Argument | Description |
---|---|
--input-enclosed-by <char> |
Sets a required field encloser |
--input-escaped-by <char> |
Sets the input escape character |
--input-fields-terminated-by <char> |
Sets the input field separator |
--input-lines-terminated-by <char> |
Sets the input end-of-line character |
--input-optionally-enclosed-by <char> |
Sets a field enclosing character |
Table 8. Hive arguments:
Argument | Description |
---|---|
--hive-home <dir> |
Override $HIVE_HOME |
--hive-import |
Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
--hive-overwrite |
Overwrite existing data in the Hive table. |
--create-hive-table |
If set, then the job will fail if the target hive |
table exits. By default this property is false. | |
--hive-table <table-name> |
Sets the table name to use when importing to Hive. |
--hive-drop-import-delims |
Drops \n, \r, and \01 from string fields when importing to Hive. |
--hive-delims-replacement |
Replace \n, \r, and \01 from string fields with user defined string when importing to Hive. |
--hive-partition-key |
Name of a hive field to partition are sharded on |
--hive-partition-value <v> |
String-value that serves as partition key for this imported into hive in this job. |
--map-column-hive <map> |
Override default mapping from SQL type to Hive type for configured columns. |
Table 9. HBase arguments:
Argument | Description |
---|---|
--column-family <family> |
Sets the target column family for the import |
--hbase-create-table |
If specified, create missing HBase tables |
--hbase-row-key <col> |
Specifies which input column to use as the row key |
In case, if input table contains composite | |
key, then <col> must be in the form of a | |
comma-separated list of composite key | |
attributes | |
--hbase-table <table-name> |
Specifies an HBase table to use as the target instead of HDFS |
--hbase-bulkload |
Enables bulk loading |
Table 10. Accumulo arguments:
Argument | Description |
---|---|
--accumulo-table <table-nam> |
Specifies an Accumulo table to use as the target instead of HDFS |
--accumulo-column-family <family> |
Sets the target column family for the import |
--accumulo-create-table |
If specified, create missing Accumulo tables |
--accumulo-row-key <col> |
Specifies which input column to use as the row key |
--accumulo-visibility <vis> |
(Optional) Specifies a visibility token to apply to all rows inserted into Accumulo. Default is the empty string. |
--accumulo-batch-size <size> |
(Optional) Sets the size in bytes of Accumulo’s write buffer. Default is 4MB. |
--accumulo-max-latency <ms> |
(Optional) Sets the max latency in milliseconds for the Accumulo batch writer. Default is 0. |
--accumulo-zookeepers <host:port> |
Comma-separated list of Zookeeper servers used by the Accumulo instance |
--accumulo-instance <table-name> |
Name of the target Accumulo instance |
--accumulo-user <username> |
Name of the Accumulo user to import as |
--accumulo-password <password> |
Password for the Accumulo user |
Table 11. Code generation arguments:
Argument | Description |
---|---|
--bindir <dir> |
Output directory for compiled objects |
--class-name <name> |
Sets the generated class name. This overrides --package-name . When combined with --jar-file , sets the input class. |
--jar-file <file> |
Disable code generation; use specified jar |
--outdir <dir> |
Output directory for generated code |
--package-name <name> |
Put auto-generated classes in this package |
--map-column-java <m> |
Override default mapping from SQL type to Java type for configured columns. |
Table 13. Common arguments
Argument | Description |
---|---|
--connect <jdbc-uri> |
Specify JDBC connect string |
--connection-manager <class-name> |
Specify connection manager class to use |
--driver <class-name> |
Manually specify JDBC driver class to use |
--hadoop-mapred-home <dir> |
Override $HADOOP_MAPRED_HOME |
--help |
Print usage instructions |
--password-file |
Set path for a file containing the authentication password |
-P |
Read password from console |
--password <password> |
Set authentication password |
--username <username> |
Set authentication username |
--verbose |
Print more information while working |
--connection-param-file <filename> |
Optional properties file that provides connection parameters |
--relaxed-isolation |
Set connection transaction isolation to read uncommitted for the mappers. |
Table 14. Import control arguments:
Argument | Description |
---|---|
--as-avrodatafile |
Imports data to Avro Data Files |
--as-sequencefile |
Imports data to SequenceFiles |
--as-textfile |
Imports data as plain text (default) |
--as-parquetfile |
Imports data to Parquet Files |
--direct |
Use direct import fast path |
--inline-lob-limit <n> |
Set the maximum size for an inline LOB |
-m,--num-mappers <n> |
Use n map tasks to import in parallel |
--warehouse-dir <dir> |
HDFS parent for table destination |
-z,--compress |
Enable compression |
--compression-codec <c> |
Use Hadoop codec (default gzip) |
--exclude-tables <tables> |
Comma separated list of tables to exclude from import process |
--autoreset-to-one-mapper |
Import should use one mapper if a table with no primary key is encountered |
Table 15. Output line formatting arguments:
Argument | Description |
---|---|
--enclosed-by <char> |
Sets a required field enclosing character |
--escaped-by <char> |
Sets the escape character |
--fields-terminated-by <char> |
Sets the field separator character |
--lines-terminated-by <char> |
Sets the end-of-line character |
--mysql-delimiters |
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ' |
--optionally-enclosed-by <char> |
Sets a field enclosing character |
Table 16. Input parsing arguments:
Argument | Description |
---|---|
--input-enclosed-by <char> |
Sets a required field encloser |
--input-escaped-by <char> |
Sets the input escape character |
--input-fields-terminated-by <char> |
Sets the input field separator |
--input-lines-terminated-by <char> |
Sets the input end-of-line character |
--input-optionally-enclosed-by <char> |
Sets a field enclosing character |
Table 17. Hive arguments:
Argument | Description |
---|---|
--hive-home <dir> |
Override $HIVE_HOME |
--hive-import |
Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
--hive-overwrite |
Overwrite existing data in the Hive table. |
--create-hive-table |
If set, then the job will fail if the target hive |
table exits. By default this property is false. | |
--hive-table <table-name> |
Sets the table name to use when importing to Hive. |
--hive-drop-import-delims |
Drops \n, \r, and \01 from string fields when importing to Hive. |
--hive-delims-replacement |
Replace \n, \r, and \01 from string fields with user defined string when importing to Hive. |
--hive-partition-key |
Name of a hive field to partition are sharded on |
--hive-partition-value <v> |
String-value that serves as partition key for this imported into hive in this job. |
--map-column-hive <map> |
Override default mapping from SQL type to Hive type for configured columns. |
Table 18. Code generation arguments:
Argument | Description |
---|---|
--bindir <dir> |
Output directory for compiled objects |
--jar-file <file> |
Disable code generation; use specified jar |
--outdir <dir> |
Output directory for generated code |
--package-name <name> |
Put auto-generated classes in this package |
Table 19. Common arguments
Argument | Description |
---|---|
--connect <hostname> |
Specify mainframe host to connect |
--connection-manager <class-name> |
Specify connection manager class to use |
--hadoop-mapred-home <dir> |
Override $HADOOP_MAPRED_HOME |
--help |
Print usage instructions |
--password-file |
Set path for a file containing the authentication password |
-P |
Read password from console |
--password <password> |
Set authentication password |
--username <username> |
Set authentication username |
--verbose |
Print more information while working |
--connection-param-file <filename> |
Optional properties file that provides connection parameters |
Table 20. Import control arguments:
Argument | Description |
---|---|
--as-avrodatafile |
Imports data to Avro Data Files |
--as-sequencefile |
Imports data to SequenceFiles |
--as-textfile |
Imports data as plain text (default) |
--as-parquetfile |
Imports data to Parquet Files |
--delete-target-dir |
Delete the import target directory if it exists |
-m,--num-mappers <n> |
Use n map tasks to import in parallel |
--target-dir <dir> |
HDFS destination dir |
--warehouse-dir <dir> |
HDFS parent for table destination |
-z,--compress |
Enable compression |
--compression-codec <c> |
Use Hadoop codec (default gzip) |
Table 21. Output line formatting arguments:
Argument | Description |
---|---|
--enclosed-by <char> |
Sets a required field enclosing character |
--escaped-by <char> |
Sets the escape character |
--fields-terminated-by <char> |
Sets the field separator character |
--lines-terminated-by <char> |
Sets the end-of-line character |
--mysql-delimiters |
Uses MySQL’s default delimiter set: fields: , lines: \n escaped-by: \ optionally-enclosed-by: ' |
--optionally-enclosed-by <char> |
Sets a field enclosing character |
Table 22. Input parsing arguments:
Argument | Description |
---|---|
--input-enclosed-by <char> |
Sets a required field encloser |
--input-escaped-by <char> |
Sets the input escape character |
--input-fields-terminated-by <char> |
Sets the input field separator |
--input-lines-terminated-by <char> |
Sets the input end-of-line character |
--input-optionally-enclosed-by <char> |
Sets a field enclosing character |
Table 23. Hive arguments:
Argument | Description |
---|---|
--hive-home <dir> |
Override $HIVE_HOME |
--hive-import |
Import tables into Hive (Uses Hive’s default delimiters if none are set.) |
--hive-overwrite |
Overwrite existing data in the Hive table. |
--create-hive-table |
If set, then the job will fail if the target hive |
table exits. By default this property is false. | |
--hive-table <table-name> |
Sets the table name to use when importing to Hive. |
--hive-drop-import-delims |
Drops \n, \r, and \01 from string fields when importing to Hive. |
--hive-delims-replacement |
Replace \n, \r, and \01 from string fields with user defined string when importing to Hive. |
--hive-partition-key |
Name of a hive field to partition are sharded on |
--hive-partition-value <v> |
String-value that serves as partition key for this imported into hive in this job. |
--map-column-hive <map> |
Override default mapping from SQL type to Hive type for configured columns. |
Table 24. HBase arguments:
Argument | Description |
---|---|
--column-family <family> |
Sets the target column family for the import |
--hbase-create-table |
If specified, create missing HBase tables |
--hbase-row-key <col> |
Specifies which input column to use as the row key |
In case, if input table contains composite | |
key, then <col> must be in the form of a | |
comma-separated list of composite key | |
attributes | |
--hbase-table <table-name> |
Specifies an HBase table to use as the target instead of HDFS |
--hbase-bulkload |
Enables bulk loading |
Table 25. Accumulo arguments:
Argument | Description |
---|---|
--accumulo-table <table-nam> |
Specifies an Accumulo table to use as the target instead of HDFS |
--accumulo-column-family <family> |
Sets the target column family for the import |
--accumulo-create-table |
If specified, create missing Accumulo tables |
--accumulo-row-key <col> |
Specifies which input column to use as the row key |
--accumulo-visibility <vis> |
(Optional) Specifies a visibility token to apply to all rows inserted into Accumulo. Default is the empty string. |
--accumulo-batch-size <size> |
(Optional) Sets the size in bytes of Accumulo’s write buffer. Default is 4MB. |
--accumulo-max-latency <ms> |
(Optional) Sets the max latency in milliseconds for the Accumulo batch writer. Default is 0. |
--accumulo-zookeepers <host:port> |
Comma-separated list of Zookeeper servers used by the Accumulo instance |
--accumulo-instance <table-name> |
Name of the target Accumulo instance |
--accumulo-user <username> |
Name of the Accumulo user to import as |
--accumulo-password <password> |
Password for the Accumulo user |
Table 26. Code generation arguments:
Argument | Description |
---|---|
--bindir <dir> |
Output directory for compiled objects |
--class-name <name> |
Sets the generated class name. This overrides --package-name . When combined with --jar-file , sets the input class. |
--jar-file <file> |
Disable code generation; use specified jar |
--outdir <dir> |
Output directory for generated code |
--package-name <name> |
Put auto-generated classes in this package |
--map-column-java <m> |
Override default mapping from SQL type to Java type for configured columns. |
A basic import of all sequential files in a partitioned dataset named EMPLOYEES
in the mainframe host z390:
$ sqoop import-mainframe --connect z390 --dataset EMPLOYEES \ --username SomeUser -P Enter password: (hidden)
Controlling the import parallelism (using 8 parallel tasks):
$ sqoop import-mainframe --connect z390 --dataset EMPLOYEES \ --username SomeUser --password-file mypassword -m 8
Importing the data to Hive:
$ sqoop import-mainframe --connect z390 --dataset EMPLOYEES \ --hive-import
A basic import of a table named EMPLOYEES
in the corp
database:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES
A basic import requiring a login:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \ --username SomeUser -P Enter password: (hidden)
Selecting specific columns from the EMPLOYEES
table:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \ --columns "employee_id,first_name,last_name,job_title"
Controlling the import parallelism (using 8 parallel tasks):
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \ -m 8
Storing data in SequenceFiles, and setting the generated class name to com.foocorp.Employee
:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \ --class-name com.foocorp.Employee --as-sequencefile
Specifying the delimiters to use in a text-mode import:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \ --fields-terminated-by '\t' --lines-terminated-by '\n' \ --optionally-enclosed-by '\"'
Importing the data to Hive:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \ --hive-import
Importing only new employees:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \ --where "start_date > '2010-01-01'"
Changing the splitting column from the default:
$ sqoop import --connect jdbc:mysql://db.foo.com/corp --table EMPLOYEES \ --split-by dept_id
Performing an incremental import of new data, after having already imported the first 100,000 rows of a table:
$ sqoop import --connect jdbc:mysql://db.foo.com/somedb --table sometable \ --where "id > 100000" --target-dir /incremental_dataset --append
An import of a table named EMPLOYEES
in the corp
database that uses validation to validate the import using the table row count and number of rows copied into HDFS: More Details
$ sqoop import --connect jdbc:mysql://db.foo.com/corp \ --table EMPLOYEES --validate
Import all tables from the corp
database:
$ sqoop import-all-tables --connect jdbc:mysql://db.foo.com/corp