卸载oracle数据至flat file之二

http://askanantha.blogspot.com/2009/01/unloading-oracle-data-to-flat-files.html
Fastreader from WisdomForce,  http://www.wisdomforce.com/

在我以前的一个博客http://www.cnblogs.com/harrychinese/archive/2011/06/30/Unload_Oracle_data_into_text_file.html中, 提及了sqluldr2 这个 oracle 卸数工具. 这几天在使用sqluldr2过程中, 再次遭遇 sqluldr2错误地将varchar2(100)导出为长度4000的情况,(我用的sqluldr2.exe是2011-3-24日创建的,linux 32/64是2011-3-25日创建的). 为了解决这个问题, google了很久, 又有新的发现:

0. 新发现的卸数工具
   *  http://askanantha.blogspot.com/2009/01/unloading-oracle-data-to-flat-files.html
   *  Fastreader from WisdomForce,  http://www.wisdomforce.com/
1. 如何打印出sqluldr2.exe的全部选项
  sqluldr2.exe help=yes
2.sqluldr2 和 ociuldr 和 ociuldr2, tbuldr 的关系
  2.1 ociuldr 是Fangxin Lou采用的是OCI 7接口写的, ociuldr 源码下载地址 http://www.anysql.net/software/ociuldr.c
  2.2 ociuldr2 是hrb_qiuyb根据ociuldr源代码, 用OCI 8的函数接口重写了文本导出工具, 所有已有的功能和命令行参数保持不变, 并增加了对CLOB/BLOB字段的支持(导出成独立的文件), 并且在Oracle 10g下可以用sys用户登录进行操作了. 详细信息见 http://www.anysql.net/tools/ociuldr2_source_code.html , ociuldr2 源码下载地址 http://www.anysql.net/software/ociuldr2.c
  2.3 sqluldr2 是Fangxin Lou采用OCI 8实现了sqluldr2, 该程序没有开放源码, 不过有Windows/Linux 64/Linux 32/AIX的版本供免费下载使用.  程序下载地址: http://www.anysql.net/software/sqluldr.zip
  2.4 tbuldr(ToolBox*UnLoader) 是 NinGoo 根据 ociuldr2 源码重新做了实现, 并且实现了部分sqludr2的功能, 详细信息见http://www.ningoo.net/html/2009/learn_oci_programming_from_ociuldr.html. 源码和编译后的程序下载地址为:  http://www.ningoo.net/software/tbuldr.zip
  2.5. 网上还有一个OCIExtract, 经查看源码, 这个OCIExtract.c和tbuldr.c完全一样.
 
3. 到底该选择sqluldr2 还是 ociuldr 还是 ociuldr2?
   3.1. 按照Fangxin Lou讲: 如果对性能没有极端要求, 没有必要换到 sqluldr2. 有些数据仓库的应用, 或是抽取数据给搜索引挚, 或是用文本方式来归档保存巨量数据的情况下, 则可以考虑升级到 sqluldr2 来导出文本.
   3.2. ITPUB上有网友做个测试报告称, sqluldr2 比 ociuldr快70%
   3.3 关于hrb_qiuyb版 ociuldr2, tbuldr(ToolBox*UnLoader)作者讲,hrb_qiuyb版的bug比较多,动不动就segment fault
   3.4 tbuldr(ToolBox*UnLoader)作者, 这样评价他的作品: 第一次用c和oci正儿八经的写东西,bug肯定一大堆。山寨有风险,慎用. 我自己测试了该程序, 总是报ORA-12154.
   3.5. 我在使用sqluldr2过程中, 碰到了sqluldr2 错误地将varchar2(100)导出为长度4000的情况, 但我使用 ociuldr 却没有发现同样的问题, 应该是 sqluldr2 引入的新bug. 也许可以通过给 sqluldr2 加上width 参数来避免这个问题, 但对于大表的话, 给每个输出字段都加上width是很麻烦的, 有空的时候, 可以编写一个小工具, 辅助生成width option.
     
  <<首先sqluldr2 还是 ociuldr 还是 ociuldr2都不支持 nvarchar2/nchar类型(因为程序中大量使用了基于byte的数组, 同时输出的文件时ansi格式的, 所以肯定不支持nchar类型). 对于多数情况, 我的建议是采用ociuldr; 如果性能要求特别高, 应采用用sqluldr2, 但必须加上width参数, 以免字段长度在导出时有误. >>
 
4. 如何使用ociuldr
    网上可以下载ociuldr.c, 但是找不到编译好的执行程序, 如果要在RHEL 64bit上运行, 可以使用gcc编译.

    编译的方法是:
    gcc -m64 -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -I${ORACLE_HOME}/rdbms/public -I${ORACLE_HOME}/rdbms/demo -L${ORACLE_HOME}/lib -lclntsh -o ociuldr.bin ociuldr.c
     
    运行程序的方法是:
    ./ociuldr.bin 参数...

5. 如何导入 ociuldr 或 sqluldr2 生成的文件
    如果我们是文本文件是定长格式, 在导入时候, 尤其需要注意各种类型字段的导出长度, 我专门设计了一个测试表TYPE_WIDTH_TEST, 来确定ociuldr或sqluldr2导出各个类型字段的长度. 以下是测试结果: (注ociuldr_h是ociuldr做了简单修正的版本, 绝大多数的字段处理都一样, 只是修正了number和TIMESTAMP TZ的输出长度)

====begin of ociuldr和sqluldr2和ociuldr_h的字段输出长度=======
DATE --ociuldr 长度为19,--sqluldr2 长度为19,格式为YYYY-MM-DD HH24:MI:SS
TIMESTAMP(6) --ociuldr 长度为26,--sqluldr2 长度为26,格式为YYYY-MM-DD HH24:MI:SSXFF
TIMESTAMP(6) WITH LOCAL TIME ZONE --ociuldr 长度为75,--sqluldr2长度为33,-ociuldr_h 修正长度为33
TIMESTAMP(6) WITH TIME ZONE  --ociuldr 长度为33,--sqluldr2 长度为33,--ociuldr_h 修正长度为33,
NUMBER --ociuldr 长度为40,--sqluldr2 长度为46, --ociuldr_h 修正长度为46,
NUMBER(38) --ociuldr 长度为precision,--sqluldr2 长度为precision+2,--ociuldr_h 修正长度为precision+2,
NUMBER(25,5)--ociuldr 长度为precision,--sqluldr2 长度为precision+2,--ociuldr_h 修正长度为precision+2,
NUMBER(20)--ociuldr 长度为precision,--sqluldr2 长度为precision+2,--ociuldr_h 修正长度为precision+2,
VARCHAR2(3900)--ociuldr 长度为length,--sqluldr2长度为length,
VARCHAR2(40)--ociuldr 长度为length,--sqluldr2长度为length,
NVARCHAR2(200)--ociuldr 长度没有规律,--sqluldr2 长度没有规律,
NVARCHAR2(16)--ociuldr 长度没有规律,--sqluldr2 长度没有规律,
CHAR(10)--ociuldr 长度为length,--sqluldr2 长度为length,
CHAR(500)--ociuldr 长度为length,--sqluldr2 长度为length,
INTERVAL YEAR(2) TO MONTH--ociuldr 长度为75,--sqluldr2 长度为20,
INTERVAL DAY(2) TO SECOND(6)--ociuldr 长度为75,--sqluldr2 长度为30,
ROWID, --ociuldr 长度为18,--sqluldr2 长度为18,
FLOAT --ociuldr 长度为128,--sqluldr2 长度为129,
FLOAT(22)--ociuldr 长度为precision+2,--sqluldr2 长度为precision+3,
====end of ociuldr和sqluldr2和ociuldr_h的字段输出长度=======

从上面的结果可以看出, sqluldr2在处理float类型不太对劲. ociuldr在处理number类型的输出长度时候, 不太正确, 没有考虑负号和小数点的问题. 我做了简单的修正, 取名ociuldr_h, 结果和sqluldr2保持一致.需要注意的是, 修正后的ociuldr程序和sqluldr2在处理NTERVAL YEAR(2) TO MONTH和 INTERVAL DAY(2) TO SECOND(6)时, 输出的长度仍不一致, 好在这两种类型不常用. 
   

ociuldr程序是通过oci的odefin()函数来获取字段的类型, 进而控制各种字段在输出时的长度,  odefin()常见的返回的类型值参考网页http://download.oracle.com/docs/cd/B13789_01/appdev.101/b10779/oci03typ.htm, 下面列出了部分返回值
/* input data types */
#define SQLT_CHR  1           /* (ORANET TYPE) character string */
#define SQLT_NUM  2             /* (ORANET TYPE) oracle numeric */
#define SQLT_INT  3                    /* (ORANET TYPE) integer */
#define SQLT_FLT  4      /* (ORANET TYPE) Floating point number */
#define SQLT_STR  5                   /* zero terminated string */
#define SQLT_VNU  6           /* NUM with preceding length byte */
#define SQLT_PDN  7     /* (ORANET TYPE) Packed Decimal Numeric */
#define SQLT_LNG  8                                     /* long */
#define SQLT_VCS  9                /* Variable character string */
#define SQLT_NON  10         /* Null/empty PCC Descriptor entry */
#define SQLT_RID  11                                   /* rowid */
#define SQLT_DAT  12                   /* date in oracle format */
#define SQLT_VBI  15                    /* binary in VCS format */
#define SQLT_BIN  23                     /* binary data(DTYBIN) */
#define SQLT_LBI  24                             /* long binary */
#define SQLT_UIN  68                        /* unsigned integer */
#define SQLT_SLS  91           /* Display sign leading separate */
#define SQLT_LVC  94                     /* Longer longs (char) */
#define SQLT_LVB  95                      /* Longer long binary */
#define SQLT_AFC  96                         /* Ansi fixed char */
#define SQLT_AVC  97                           /* Ansi Var char */
#define SQLT_CUR  102                           /* cursor  type */
#define SQLT_LAB  105                             /* label type */
#define SQLT_OSL  106                           /* oslabel type */
   
   
其他阅读条目:
在豆丁上找到Fangxin Lou写一篇关于oracle 数据如何导出文本, 非常全面.
http://www.docin.com/p-42836914.html


附件,TYPE_WIDTH_TEST_DDL-Data.sql, 测试table的DDL和DML语句

prompt PL/SQL Developer import file
prompt Created on 2011年8月7日 by Harry
set feedback off
set define off
prompt Creating TYPE_WIDTH_TEST...
create table TYPE_WIDTH_TEST
(
  DATE_TYPE_1         DATE
  ,TIMESTAMP_TYPE_1    TIMESTAMP(6)
  ,TIMESTAMP_TYPE_2    TIMESTAMP(6) WITH LOCAL TIME ZONE
  ,TIMESTAMP_TYPE_3    TIMESTAMP(6) WITH TIME ZONE
  ,NUMBERTYPE_1        NUMBER
  ,NUMBERTYPE_2        NUMBER(38)
  ,NUMBERTYP_3         NUMBER(25,5)
  ,NUMBERTYP_4         NUMBER(20)
  ,VARCHAR2TYPE_1      VARCHAR2(3900)
  ,VARCHAR2TYPE_2      VARCHAR2(40)
  ,NVARCHAR2_TYPE_1    NVARCHAR2(200)
  ,NVARCHAR2_TYPE_2    NVARCHAR2(16)
  ,CHARTYPE_1          CHAR(10)
  ,CHARTYPE_2          CHAR(500)
  ,INTERVALYEAR_TYPE_1 INTERVAL YEAR(2) TO MONTH
  ,INTERVALDAY_TYPE_1  INTERVAL DAY(2) TO SECOND(6)
  ,FLOAT_TYPE1         FLOAT
  ,FLOAT_TYPE2         FLOAT(22)
  --BINARY_DOUBLE_TYPE_1 BINARY_DOUBLE, --ORA-03115: unsupported network datatype or representation
  --BINARY_FLOAT_TYPE_1  BINARY_FLOAT,  --ORA-03115: unsupported network datatype or representation 
)
tablespace SDB_DATA
  pctfree 10
  initrans 1
  maxtrans 255
  storage
  (
    initial 64K
    minextents 1
    maxextents unlimited
  );

prompt Disabling triggers for TYPE_WIDTH_TEST...
alter table TYPE_WIDTH_TEST disable all triggers;
prompt Deleting TYPE_WIDTH_TEST...
delete from TYPE_WIDTH_TEST;
commit;
prompt Loading TYPE_WIDTH_TEST...
insert into TYPE_WIDTH_TEST (DATE_TYPE_1, TIMESTAMP_TYPE_1, TIMESTAMP_TYPE_2, TIMESTAMP_TYPE_3, NUMBERTYPE_1, NUMBERTYPE_2, NUMBERTYP_3, NUMBERTYP_4, VARCHAR2TYPE_1, VARCHAR2TYPE_2, NVARCHAR2_TYPE_1, NVARCHAR2_TYPE_2, CHARTYPE_1, CHARTYPE_2, INTERVALYEAR_TYPE_1, INTERVALDAY_TYPE_1, FLOAT_TYPE1, FLOAT_TYPE2)
values (to_date('06-09-2011 09:30:20', 'dd-mm-yyyy hh24:mi:ss'), to_timestamp('13-10-2011 11:11:11.000000', 'dd-mm-yyyy hh24:mi:ss.ff'), to_timestamp('13-10-2012 11:11:11.000000', 'dd-mm-yyyy hh24:mi:ss.ff'), to_timestamp_tz('13-10-2013 11:11:11.000000 +08:00', 'dd-mm-yyyy hh24:mi:ss.ff tzh:tzm'), 33.3333333333333333333333333333333333333, 67, 11.11111, 22, '1  ,,,,2  ,,,,3  ,,,,', 'this is test, hope it works. this is te', 'NVARCHAR2_TYPE_1', 'NVARCHAR2_TYPE_2', 'abcdefghij', 'ABCDEF                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              ', '+04-02', '+04 13:50:00.000000', 20.5555555555556, 30.66667);
commit;
prompt 1 records loaded
prompt Enabling triggers for TYPE_WIDTH_TEST...
alter table TYPE_WIDTH_TEST enable all triggers;
set feedback on
set define on
prompt Done.

 

附件 我修正的ociuldr的source
/*我对 ociuldr.c 的getColumns() 做了修正, 集中在switch语句内部*/
sword getColumns(FILE *fpctl, Lda_Def *lda,Cda_Def *cda, struct COLUMN *collist, int fixlen)
{
    int totallen=1;
    sword col;
    struct COLUMN *tempcol;
    struct COLUMN *nextcol;
    ub1 *buf;

    nextcol = collist;
 
    /* Describe the select-list items. */
    if(fpctl != NULL) fprintf(fpctl,"(\n");
    for (col = 0; col < MAX_SELECT_LIST_SIZE; col++)
    {
        tempcol = (struct COLUMN *) malloc(sizeof(struct COLUMN));
        tempcol-> indp        = (sb2 *)malloc(DEFAULT_ARRAY_SIZE * sizeof(sb2));
        tempcol-> col_retlen  = (ub2 *)malloc(DEFAULT_ARRAY_SIZE * sizeof(ub2));
        tempcol-> col_retcode = (ub2 *)malloc(DEFAULT_ARRAY_SIZE * sizeof(ub2));

        tempcol->next = NULL;
        tempcol->colbuf = NULL;
              memset(tempcol->fmtstr,0,64);

        tempcol->buflen = MAX_ITEM_BUFFER_SIZE;
        if (odescr(cda, col + 1, &(tempcol->dbsize),
                   &(tempcol->dbtype), &(tempcol->buf[0]),
                   &(tempcol->buflen), &(tempcol->dsize),
                   &(tempcol->precision), &(tempcol->scale),
                   &(tempcol->nullok)))
        {
            if(fpctl != NULL) fprintf(fpctl,"\n");
            free(tempcol);
            /* Break on end of select list. */
            if (cda->rc == VAR_NOT_IN_LIST)
                break;
            else
            {
                SQLError(lda,cda);
                return -1;
            }
        }

        if(col)
        {
          if(fpctl != NULL) fprintf(fpctl,",\n");
        }

        nextcol->next = tempcol;
        nextcol=tempcol;

        nextcol->buf[nextcol->buflen]='\0';
 
        switch(nextcol->dbtype)
        {
            case DATE_TYPE: /*12*/
                nextcol->dsize=19;
                nextcol->dbtype=(fixlen?SQLT_AVC:STRING_TYPE);
                if(fpctl != NULL)
                {
                   if (fixlen)
                      fprintf(fpctl,"  %s POSITION(%d:%d) DATE \"YYYY-MM-DD HH24:MI:SS\"",
                          nextcol->buf , totallen, totallen + nextcol->dsize-1);
                   else
                      fprintf(fpctl,"  %s DATE \"YYYY-MM-DD HH24:MI:SS\"", nextcol->buf );
                }
                break;
            case 180: /* TIMESTAMP */
                nextcol->dsize=26;
                nextcol->dbtype=(fixlen?SQLT_AVC:STRING_TYPE);
                if(fpctl != NULL)
                {
                   if (fixlen)
                      fprintf(fpctl,"  %s POSITION(%d:%d) TIMESTAMP \"YYYY-MM-DD HH24:MI:SSXFF\"",
                           nextcol->buf , totallen, totallen + nextcol->dsize-1);
                   else
                      fprintf(fpctl,"  %s TIMESTAMP \"YYYY-MM-DD HH24:MI:SSXFF\"", nextcol->buf );
                }
                break;
            case 181: /* TIMESTAMP WITH TIMEZONE */
                nextcol->dsize=33;
                nextcol->dbtype=(fixlen?SQLT_AVC:STRING_TYPE);
                if(fpctl != NULL)
                {
                   if (fixlen)
                       fprintf(fpctl,"  %s POSITION(%d:%d) TIMESTAMP WITH TIME ZONE \"YYYY-MM-DD HH24:MI:SSXFF TZH:TZM\"",
                           nextcol->buf , totallen, totallen + nextcol->dsize-1);
                   else
                       fprintf(fpctl,"  %s TIMESTAMP WITH TIME ZONE \"YYYY-MM-DD HH24:MI:SSXFF TZH:TZM\"", nextcol->buf );
                }
                break;
               
            case 231: /* TIMESTAMP WITH LOCAL TIME ZONE */
                nextcol->dsize=33;
                nextcol->dbtype=(fixlen?SQLT_AVC:STRING_TYPE);
                if(fpctl != NULL)
                {
                   if (fixlen)
                       fprintf(fpctl,"  %s POSITION(%d:%d) TIMESTAMP WITH LOCAL TIME ZONE \"YYYY-MM-DD HH24:MI:SSXFF TZH:TZM\"",
                           nextcol->buf , totallen, totallen + nextcol->dsize-1);
                   else
                       fprintf(fpctl,"  %s TIMESTAMP WITH LOCAL TIME ZONE \"YYYY-MM-DD HH24:MI:SSXFF TZH:TZM\"", nextcol->buf );
                }
                break;               
            case 24:  /* LONG RAW */
            case 113: /* BLOB */
               nextcol->dsize=DEFAULT_LONG_SIZE;
               nextcol->dbtype=(fixlen?SQLT_AVC:STRING_TYPE);
               if(fpctl != NULL)
               {
                  if (fixlen)
                      fprintf(fpctl,"  %s POSITION(%d:%d) CHAR(%d) ",
                          nextcol->buf, totallen, totallen + nextcol->dsize-1, DEFAULT_LONG_SIZE);
                  else
                      fprintf(fpctl,"  %s CHAR(%d) ", nextcol->buf, DEFAULT_LONG_SIZE);
               }
               break;
            case ROWID_TYPE:
                nextcol->dsize=18;
                nextcol->dbtype=(fixlen?SQLT_AVC:STRING_TYPE);
                if(fpctl != NULL)
                {
                    if (fixlen)
                       fprintf(fpctl, "  %s POSITION(%d:%d) CHAR(%d)",
                            nextcol->buf, totallen, totallen + nextcol->dsize-1, nextcol->dsize);
                    else
                       fprintf(fpctl, "  %s CHAR(%d)", nextcol->buf,nextcol->dsize);
                }
                break;
            case NUMBER_TYPE:
               if (nextcol->precision)
                   nextcol->dsize=nextcol->precision+2;
               else
                   nextcol->dsize=46;
               nextcol->dbtype=(fixlen?SQLT_AVC:STRING_TYPE);
               if(fpctl != NULL)
               {
                  if (fixlen)
                      fprintf(fpctl,"  %s POSITION(%d:%d) CHAR(%d)",
                           nextcol->buf,totallen, totallen + nextcol->dsize-1, nextcol->dsize);
                  else
                      fprintf(fpctl,"  %s CHAR(%d)", nextcol->buf,nextcol->dsize);
               }
               break;
            case 112: /* CLOB */
            case 114: /* BFILE */
               nextcol->dsize=DEFAULT_LONG_SIZE;
               nextcol->dbtype=(fixlen?SQLT_AVC:STRING_TYPE);
               if(fpctl != NULL)
               {
                  if (fixlen)
                      fprintf(fpctl,"  %s POSITION(%d:%d) CHAR(%d)",
                          nextcol->buf, totallen, totallen + nextcol->dsize-1, DEFAULT_LONG_SIZE);
                  else
                      fprintf(fpctl,"  %s CHAR(%d)", nextcol->buf,DEFAULT_LONG_SIZE);
               }
               break;
            default:
               nextcol->dbtype=(fixlen?SQLT_AVC:STRING_TYPE);
                 if (nextcol->dsize>4000) nextcol->dsize = 4000;
                 if (nextcol->dsize==0)  nextcol->dsize = 4000;
               if(fpctl != NULL)
               {
                  if (fixlen)
                     fprintf(fpctl,"  %s POSITION(%d:%d) CHAR(%d)", nextcol->buf,
                            totallen, totallen + nextcol->dsize-1,
                            (nextcol->dsize==0?DEFAULT_LONG_SIZE:nextcol->dsize));
                  else
                     fprintf(fpctl,"  %s CHAR(%d)", nextcol->buf,(nextcol->dsize==0?DEFAULT_LONG_SIZE:nextcol->dsize));
               }
               break;
        }
        /* Set for long type column */
        if (nextcol->dsize > DEFAULT_LONG_SIZE || nextcol->dsize == 0) 
            nextcol->dsize = DEFAULT_LONG_SIZE;
        /* add one more byte to store the ternimal char of string */
        sprintf(nextcol->fmtstr,"%%-%ds", nextcol->dsize);
        totallen = totallen + nextcol->dsize;
        nextcol->dsize ++;

        nextcol->colbuf = malloc(DEFAULT_ARRAY_SIZE * (nextcol->dsize));
        memset(nextcol->colbuf,0,DEFAULT_ARRAY_SIZE * (nextcol->dsize));

        if (odefin(cda, col + 1,
                   nextcol->colbuf, nextcol->dsize, nextcol->dbtype,
                   -1, nextcol->indp, (text *) 0, -1, -1,
                   nextcol->col_retlen,nextcol->col_retcode))
        {
            SQLError(lda,cda);
            return -1;
        }
    }
    if(fpctl != NULL)
       fprintf(fpctl,")\n");
    fputs("\n",(fp_log == NULL?stdout:fp_log));
    return col;
}

posted @ 2011-09-07 11:42  harrychinese  阅读(1863)  评论(1编辑  收藏  举报