IMPORT FROM 表数据导入

Syntax
IMPORT FROM [<file_type>] <file_path> [INTO <table_name>] [WITH <import_from_option_list>]
Syntax Elements
<file_type> ::= CSV FILE | CONTROL FILE
The type of the file to be imported. You can specify either comma-separated values or control file formats. For more information on CSV and control file formats, see Examples.
待导入文件的类型,有两种类型的文件:
  • CSV FILE :该文件存储的为表数据
  • CONTROL FILE:该文件是控制文件,即将导入的脚本写在这个文件里,然后还是通过这个 IMPORT 语句执行这个脚本控制文件即可,这样就不需要将导数的语句直接贴在SQL编辑器里运行了
<file_path> ::= <string_literal>
The complete path and file name of the file to import. 
文件路(注意是服务器上的,不是本机哦)
<table_name> ::= [<schema_name>.]<identifier><schema_name> ::= <unicode_name>
The target table name, with optional schema name, where the imported data will be stored. 
将数据导到哪个表中
WITH <import_from_option_list> ::= <import_from_option>[{, <import_from_option>}...] <import_from_option> ::= THREADS <number_of_threads> | BATCH <number_of_records_of_each_commit> | TABLE LOCK | NO TYPE CHECK | SKIP FIRST <number_of_rows_to_skip> ROW | COLUMN LIST IN FIRST ROW [<with_schema_flexibility>] | COLUMN LIST ( <column_name_list> ) [<with_schema_flexibility>] | RECORD DELIMITED BY <string_for_record_delimiter> | FIELD DELIMITED BY <string_for_field_delimiter> | OPTIONALLY ENCLOSED BY <character_for_optional_enclosure> | DATE FORMAT <string_for_date_format> | TIME FORMAT <string_for_time_format> | TIMESTAMP FORMAT <string_for_timestamp_format> | ERROR LOG <file_path_of_error_log> | FAIL ON INVALID DATA
A list of import options.
导入选项
THREADS <unsigned_integer>
The number of threads that can be used for concurrent import. The default value is 1 and maximum allowed is 256.
允许最大的并行线程数量,默认是1,最大是256
BATCH <number_of_records_of_each_commit> ::= <unsigned_integer>
The number of records to be inserted in each commit.
每批提交的条数,即多少条后提交
 
Note: THREADS and BATCH can be used to achieve high loading performance by enabling parallel loading and also by committing many records at once. In general, for column tables, a good setting to use is 10 parallel loading threads, with a commit frequency of 10,000 records or greater.
注: THREADS 、BATCH 这两个参数是用来提升并发导入数据的性能的参数,在通常情况中下对于列存储表,推荐使用10个并发线程,每批1万条就提交
TABLE LOCK
Can be used for faster data loading for column store tables. 可以加快列存储表数据的导入

It is recommended to specify this option carefully as it incurs table locks in exclusive mode as well as explicit hard merges and savepoints after data loading is finished. The performance gain from this option can vary according to the table constraints (like primary key) and optimization of other layers (like persistency or DML command).

NO TYPE CHECK
Specifies that the record will be inserted without checking the type of each field.
导入时不检查字段类型
SKIP FIRST <number_of_rows_to_skip> ROW <number_of_rows_to_skip> ::= <unsigned_integer>
Skips the specified number of rows in the import file.
指定跳过多少行后开始导入,比如有表头时
COLUMN LIST IN FIRST ROW
Indicates that the column list is stored in the first row of the CSV import file.
第一行做为COLUMN LIST选项的值,这样就可以使用COLUMN LIST IN FIRST ROW选项替代了COLUMN LIST选项(COLUMN LIST选项见下面)
COLUMN LIST ( <column_name_list> ) <column_name_list> ::= <column_name> [{, <column_name>}...]
The column list for the data being imported. The name list has one or more column names. The ordering of the column names has to match the order of the column data in the CSV file and the columns in the target table.
指定CSV文件里的数据列将要存储到表中的哪些列中,因为有时CSV里提借的列数比表实际列数要少,或者CSV里的第一列要存储到表里的第三列,等...为了解决这些列数不配以及存储的位置不配时,需要通过这个选项来实现,请看后面的示例
RECORD DELIMITED BY <string_for_record_delimiter> ::= <string_literal>
The record delimiter used in the CSV file being imported.
行与行的分隔符
FIELD DELIMITED BY <string_for_field_delimiter> ::= <string_literal>
The field delimiter of the CSV file.
字段分隔符
OPTIONALLY ENCLOSED BY <character_for_optional_enclosure> ::= <character_literal>
The optional enclosure character used to delimit field data.
字符(串)使用什么引起来
DATE FORMAT <string_for_date_format> ::= <string_literal>
The format that date strings are encoded with in the import data:
指定日期格式
  • Y : year
  • MM : month
  • MON : name of month
  • DD : day
For example:
  • 'YYYYMMDD' = 20120520
  • 'YYYY-MM-DD' = 2012-05-20
  • 'YYYY-MON-DD' : 2012-MAY-20
TIME FORMAT <string_for_time_format> ::= <string_literal>
The format that time strings are encoded with in the import data:
指定时间格式
  • HH24 : hour
  • MI : minute
  • SS : second
For example:
  • 'HH24MISS' : 143025
  • 'HH24:MI:SS' : 14:30:25
TIMESTAMP FORMAT <string_for_timestamp_format> ::= <string_literal>
The format that timestamp strings are encoded with in the import data. 
指定日期时间格式
For example:
  • 'YYYY-MM-DD HH24:MI:SS' : 2012-05-20 14:30:25
ERROR LOG <file_path_of_error_log> ::= <string_literal>

When specified, a log file of errors generated is stored in this file. Please ensure the file path you use is writeable by the database.

指定错误日志文件(含路径)

FAIL ON INVALID DATA
When specified, the IMPORT FROM command fails unless all the entries import successfully.
遇到无效数据会立即停止IMPORT FROM命令
<with_schema_flexibility> ::= WITH SCHEMA FLEXIBILITY
The option WITH SCHEMA FLEXIBILITY will create missing columns in flexible tables during CSV imports, as specified in the header (first row) of the CSV file or column list. By default, missing columns in flexible tables are not created automatically during data imports.
根据CSV里的表头行(第一行,使用COLUMN LIST IN FIRST ROW选项指定)或者是COLUMN LIST 选项中提供的列名来创建缺失的列,具体示例请参考后面示例。注:该选项是用在CREATE COLUMN TABLE...语句后面
 

 

For security reason, only CSV files located at paths defined in thecsv_import_path_filterconfiguration parameter are allowed to be loaded using the IMPORT FROM SQL statement. This feature can be disabled using theenable_csv_import_path_filterconfiguration parameter. Two related configuration parameters are specified in theimport_exportsection of the indexserver (nameserver in case of multi-DB) configuration, so you can turn off this feature or update path filter like follows:
由于安全原因,需要配置csv_import_path_filter服务器参数配置导入的CSV文件的路径,但该参数可以通过enable_csv_import_path_filter服务配置参数来禁用它,即将enable_csv_import_path_filter设置为false后,就不需要配置csv_import_path_filter路径参数了,即该参数失效
除了通过界面配置外,还可以通过下面命令来修改服务器配置参数:
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini', 'system') set ('import_export', 'enable_csv_import_path_filter') = 'false' with reconfigure
ALTER SYSTEM ALTER CONFIGURATION ('indexserver.ini', 'system') set ('import_export', 'csv_import_path_filter') = '/A;/B' with reconfigure
Note that once you add a path '/A' to path filter every sub-path of '/A' will be automatically added as well.
注:一旦配置了'/A'路径,则其下面的子文件夹也会生效,不需要另外配置,就都可以做为CSV文件的导入路径了
This feature is used with restrictions and/or is extended by the following SAP HANA option:
  • Dynamic Tiering

 

Examples

Example 1 - Importing CSV Data

You create a table mytable to store the imported data.
CREATE TABLE mytable ( A INT, B VARCHAR(10), C DATE, D TIME, E DECIMAL );
 
You create a CSV text file /usr/sap/HDB/home/Desktop/data/data.csv and add the following contents.
1,"DATA1","2012-05-20","14:30:25",123456
2,"DATA2","2012-05-21","15:30:25",234567
3,"DATA3","2012-05-22","16:30:25",345678
4,"DATA4","2012-05-23","17:30:25",456789
You execute the following command to import the data. IMPORT FROM CSV FILE '/usr/sap/HDB/home/Desktop/data/data.csv' INTO "MYTABLE" 
   WITH RECORD DELIMITED BY '\n'
   FIELD DELIMITED BY ',';
 

Example 2 - Importing using a control file

In the example below, you import the CSV data from Example 1 using a control file.
在这个示例中,我们使用控制文件来完成上面示例同样的功能 You can create a control file /usr/sap/HDB/home/Desktop/data/data.ctl and add the contents shown below to the file.
创建/usr/sap/HDB/home/Desktop/data/data.ctl控制文件,并在文件中输入以下内容:
IMPORT DATA INTO TABLE "MYTABLE" FROM '/usr/sap/HDB/home/Desktop/data/data.csv'
   RECORD DELIMITED BY '\n'
   FIELD DELIMITED BY ','
   OPTIONALLY ENCLOSED BY '"'
   ERROR LOG '/usr/sap/HDB/home/Desktop/data/data.err'
 
You execute the following command to import the data using the control file.
IMPORT FROM CONTROL FILE '/usr/sap/HDB/home/Desktop/data/data.ctl';
执行后,不管是否产生了错误,都会生成data.err文件

Example 3 - Import using date formats

In the example below, the date format is of the CSV import data is different to the default date format 'YYYY-MM-DD'. In this import data the date format used is 'MM-DD-YYYY'. You create a CSV text file /usr/sap/HDB/home/Desktop/data/data_different_date.csv and add the following contents.
1,"DATA1","05-20-2012","14:30:25",123456
2,"DATA2","05-21-2012","15:30:25",234567
3,"DATA3","05-22-2012","16:30:25",345678
4,"DATA4","05-23-2012","17:30:25",456789
You execute the following command to import the data.
IMPORT FROM CSV FILE '/usr/sap/HDB/home/Desktop/data/data_different_date.csv' INTO "MYTABLE" WITH RECORD
   DELIMITED BY '\n'
   FIELD DELIMITED BY ','
   DATE FORMAT 'MM-DD-YYYY';

Example 4 - Import using COLUMN LIST

You create a table called COLLIST to store the imported data.
CREATE TABLE COLLIST ( A INT, B VARCHAR(10), C DATE, D DECIMAL );
You create a CSV text file '/usr/sap/HDB/home/Desktop/data/data_col_list.csv' and add the following contents.
现在
data_col_list.csv文件里的内容如下,B列值没有提供,并且CSV里的第一列要存到D列里,第二列要存到C里,第三列要存储到A列里
123456,"2012-05-20",1 234567,"2012-05-21",2 345678,"2012-05-22",3 456789,"2012-05-23",4
 
You execute the following commands to import the data using a column list.
IMPORT FROM CSV FILE '/usr/sap/HDB/home/Desktop/data/data_col_list.csv' INTO "COLLIST"
   WITH RECORD DELIMITED BY '\n'
   FIELD DELIMITED BY ','
   COLUMN LIST ("D""C""A");
In order to import data without dealing with the proper table layout, it is possible to use WITH SCHEMA FLEXIBILITY as extended option of COLUMN LIST to import into a flexible table. You create a flexible table to store the imported data.
CREATE COLUMN TABLE FLEX ( X INT ) WITH SCHEMA FLEXIBILITY;--WITH SCHEMA FLEXIBILITY选项只能用于列式存储的表
创建一个可伸缩的表,只有X一列,使用上面的data_col_list.csv文件,文件里有3列,且都没存储到X列里
You execute the following commands to import previously created data_col_list.csv without explicitly adding columns.
IMPORT FROM CSV FILE '/usr/sap/HDB/home/Desktop/data/data_col_list.csv' INTO "FLEX"
   WITH RECORD DELIMITED BY '\n'
   FIELD DELIMITED BY ','
   COLUMN LIST ("A""B""C""D"--注:如果使用了 WITH SCHEMA FLEXIBILITY选项,则一定要指定COLUMN LIST,或者使用COLUMN LIST IN FIRST ROW选项来指定CSV里的第一行为COLUMN LIST
   WITH SCHEMA FLEXIBILITY;    
 
在自动创建缺失的列时,好像无法指定类型,默认类型全为 nvarchar(5000)?这不太好吧!是否有办法指定呢?





posted @ 2015-12-11 23:02  江正军  阅读(4649)  评论(0编辑  收藏  举报