Oracle数据库字符集ZHS16GBK转AL32UTF8
1、查看一下源数据库字符集
[oracle@scdb2 ~]$ export ORACLE_SID=bce
[oracle@scdb2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Tue Aug 23 10:54:01 2016
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Release 11.2.0.4.0 - 64bit Production
SQL> select userenv('language') from dual;
USERENV('LANGUAGE')
----------------------------------------------------
AMERICAN_AMERICA.ZHS16GBK
2、目标数据库字符集
SQL> select userenv('language') from dual;
USERENV('LANGUAGE')
----------------------------------------------------
AMERICAN_AMERICA.AL32UTF8
查看Oracle的官方文档说明:
可以看到两种语言的不同描述:
Asian ZHS16GBK: GBK 16-bit Simplified Chinese MB, ASCII, UDC
Universal AL32UTF8: Unicode 6.2 UTF-8 Universal character set MB, ASCII, EURO
显然, ZHS16GBK字符集和AL32UTF8是两种不同字符集
在 Binary Subset-Superset Pairs内容中可以看到:
AL32UTF8是UTF8字符集的超集,而ZHS16GBK与AL32UTF8字符集之间无关联,所以,两个字符集之间的转换需要借助于Oracle数据库工具expdp/impdp(数据泵)
3、现在,开始数据库迁移工作。
步骤1:
查看中文字符长度在不同数据库的长度: select lengthb('你') from dual;
如果字符集是ZHS16GBK,那么一个汉字占用2个字节,如果字符集是AL32UTF8,那么一个汉字占用3个字节
步骤2:
导出源数据库数据:
set NLS_LANG= AMERICAN_AMERICA.AL32UTF8
expdp system/123456@bce DIRECTORY=dpdata1 DUMPFILE=bce.dmp logfile=export_bce.log SCHEMAS=cams_bce
步骤3:
CONTENT
该选项用于指定要导出的内容.默认值为ALL
CONTENT={ALL | DATA_ONLY | METADATA_ONLY}
当设置 CONTENT为ALL时,将导出对象定义及其所有数据.为DATA_ONLY时,只导出对象数据,为METADATA_ONLY时,只导出对象定义
导入对象定义: impdp system/123456@cams DIRECTORY=dpdata1 DUMPFILE=bce.dmp LOGFILE=import_bceTable.log SCHEMAS=cams_bce CONTENT=METADATA_ONLY
步骤4:
使用csscan辅助字符集转换。如果要确保数据的完整性,应该使用csscan扫描数据库,找出所有不兼容的字符,然后通过编写相应的脚本及代码,在转换之后进行更新,确保数据的正确性。
在使用csscan之前,需要以sys用户身份创建相应的数据字典对象:
[oracle@scdb2 ~]$ export ORACLE_SID=bce
[oracle@scdb2 ~]$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.4.0 Production on Tue Aug 23 13:32:46 2016
Copyright (c) 1982, 2013, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Release 11.2.0.4.0 - 64bit Production
SQL> select instance_name from v$instance;
INSTANCE_NAME
----------------
bce
SQL> @?/rdbms/admin/csminst.sql
User created.
Grant succeeded.
……
这个脚本创建相应用户(csmig)及数据字典对象,扫描信息会记录在相应的数据字典里。
[oracle@scdb2 ~]$ csscan help=y
csscan: error while loading shared libraries: libclntsh.so.11.1: cannot open shared object file: No such file or directory
检查 metalink ,在文档 ID 742070.1 中提到,导致这个问题的原因是没有正确的设置 LD_LIBRARY_PATH 环境变量,将 $ORACLE_HOME/lib 添加到这个环境变量中可以避免错误的产生
[oracle@scdb2 lib]$ pwd
/oradata/oracle/product/11.2.0/db_1/lib
[oracle@scdb2 lib]$ env | grep PATH
PATH=/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/oracle/bin:/oradata/oracle/product/11.2.0/db_1/bin:/home/oracle/bin
[oracle@scdb2 lib]$ cd
[oracle@scdb2 ~]$ vi .bash_profile
添加语句 LD_LIBRARY_PATH=$ORACLE_HOME/lib;export LD_LIBRARY_PATH至最后
[oracle@scdb2 ~]$ source .bash_profile
[oracle@scdb2 ~]$ env | grep PATH
LD_LIBRARY_PATH=/oradata/oracle/product/11.2.0/db_1/lib
PATH=/usr/lib64/qt-3.3/bin:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin:/home/oracle/bin:/oradata/oracle/product/11.2.0/db_1/bin:/home/oracle/bin:/home/oracle/bin:/oradata/oracle/product/11.2.0/db_1/bin:/home/oracle/bin
再次查看csscan是否可用:
[oracle@scdb2 ~]$ csscan help=y
Character Set Scanner v2.2 : Release 11.2.0.4.0 - Production on Tue Aug 23 13:48:15 2016
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
You can let Scanner prompt you for parameters by entering the CSSCAN
command followed by your username/password:
Example: CSSCAN \"SYSTEM/MANAGER AS SYSDBA\"
Or, you can control how Scanner runs by entering the CSSCAN command
followed by various parameters. To specify parameters, you use keywords:
Example:
CSSCAN \"SYSTEM/MANAGER AS SYSDBA\" FULL=y TOCHAR=utf8 ARRAY=1024000 PROCESS=3
Keyword Default Prompt Description
---------- ------- ------ -------------------------------------------------
USERID yes username/password
FULL N yes scan entire database
USER yes owner of tables to be scanned
TABLE yes list of tables to scan
COLUMN yes list of columns to scan
EXCLUDE list of tables to exclude from scan
TOCHAR yes new database character set name
FROMCHAR current database character set name
TONCHAR new national character set name
FROMNCHAR current national character set name
ARRAY 1024000 yes size of array fetch buffer
PROCESS 1 yes number of concurrent scan process
MAXBLOCKS split table if block size exceed MAXBLOCKS
CAPTURE N capture convertible data
SUPPRESS maximum number of exceptions logged for each table
FEEDBACK report progress every N rows
BOUNDARIES list of column size boundaries for summary report
LASTRPT N generate report of the last database scan
LOG scan base file name of report files
PARFILE parameter file name
PRESERVE N preserve existing scan results
LCSD N no enable language and character set detection
LCSDDATA LOSSY no define the scope of the detection
HELP N show help screen (this screen)
QUERY N select clause to scan subset of tables or columns
---------- ------- ------ -------------------------------------------------
Scanner terminated successfully.
在命令行调用该工具对数据库进行扫描:
[oracle@scdb2 ~]$ export ORACLE_SID=bce
[oracle@scdb2 ~]$ csscan user=cams_bce fromchar=ZHS16GBK tochar=AL32UTF8 log=GBK2UTF8check.log capture=y array=1024000 process=4
Character Set Scanner v2.2 : Release 11.2.0.4.0 - Production on Tue Aug 23 13:57:27 2016
Copyright (c) 1982, 2011, Oracle and/or its affiliates. All rights reserved.
Username: system
Password:
Connected to:
Oracle Database 11g Release 11.2.0.4.0 - 64bit Production
Enumerating tables to scan...
. process 1 scanning CAMS_BCE.BCEBNKINF[AAAV1eAAEAAABcAAAA]
. process 2 scanning CAMS_BCE.BCEBNKINF[AAAV1eAAEAAAAJgAAA]
. process 3 scanning CAMS_BCE.BCEBNKINF[AAAV1eAAEAAACSAAAA]
. process 4 scanning CAMS_BCE.BCEBIZAUTH[AAAV1cAAEAAAAJQAAA]
…………
. process 1 scanning CAMS_BCE.ORG_ROLE_LOG[AAAV3NAAEAAAAXYAAA]
. process 4 scanning CAMS_BCE.REAL_BCTEBIL[AAAV3eAAEAAAAZgAAA]
. process 2 scanning CAMS_BCE.TSFTOTDTL[AAAV35AAEAAAAc4AAA]
. process 1 scanning CAMS_BCE.BCEBNKSGNINF[AAAV1hAAEAAAAJ4AAA]
Creating Database Scan Summary Report...
Creating Individual Exception Report...
Scanner terminated successfully.
[oracle@scdb2 ~]$ ll | grep GBK
-rw-r--r-- 1 oracle oinstall 18952360 Aug 23 14:02 GBK2UTF8check.log.err
-rw-r--r-- 1 oracle oinstall 12059 Aug 23 14:02 GBK2UTF8check.log.out
-rw-r--r-- 1 oracle oinstall 28402 Aug 23 14:01 GBK2UTF8check.log.txt
工具 csscan 会生成三个日志,一个 err 记录错误信息,比如这个例子中,一个列的长度需要变长,否则无法容纳数据长度的扩展;一个 log 记录操作步骤;而 txt 则是最终的汇总信息。
此处主要是检查err文件:
Database Scan Individual Exception Report
[Database Scan Parameters]
Parameter Value
------------------------------ ------------------------------------------------
CSSCAN Version v2.1
Instance Name bce
Database Version 11.2.0.4.0
Scan type User tables
User name cams_bce
Scan CHAR data? YES
Database character set ZHS16GBK
FROMCHAR ZHS16GBK
TOCHAR AL32UTF8
Scan NCHAR data? NO
Array fetch buffer size 1024000
Number of processes 4
Capture convertible data? YES
------------------------------ ------------------------------------------------
[Data Dictionary individual exceptions]
[Application data individual exceptions]
User : CAMS_BCE
Table : BCETXNJNL
Column: SMR
Type : CHAR(60)
Number of Exceptions : 0
Max Post Conversion Data Size: 39
ROWID Exception Type Size Cell Data(first 30 bytes)
------------------ ------------------ ----- ------------------------------
AAAV2XAAEAAAEUjAAF convertible 当天结清的票据,通知出票人
…………
------------------ ------------------ ----- ------------------------------
通过检查err文件,发现不能转换的数据主要情况是char里面存储中文字符,转换后出现长度溢出的情况。
步骤5:
将所有变长字段长度*1.5,所有长度大于1的定长字段修改为varchar2,长度*1.5。
将sql单独执行,发现错误,主要有两种:
其一,字段为系统关键字,需添加单引号
其二,字段长度大于varchar2最大长度4000
查出所有的可能带中文的字段并修改长度:
SELECT 'ALTER TABLE ' || OWNER || '.' || TABLE_NAME || ' MODIFY ("' ||
COLUMN_NAME || '" VARCHAR2(' || CEIL(DATA_LENGTH * 1.5) ||
'));'
FROM DBA_TAB_COLUMNS
WHERE OWNER = 'CAMS_BCE'
AND DATA_TYPE IN ('CHAR', 'VARCHAR2', 'NCHAR', 'NVARCHAR2')
AND DATA_LENGTH > 1
AND COLUMN_NAME not in('SYSDATE','DATE')
AND DATA_LENGTH < 2666
AND TABLE_NAME NOT IN (SELECT VIEW_NAME FROM DBA_VIEWS WHERE OWNER = 'CAMS_BCE');
AND TABLE_NAME NOT IN (SELECT VIEW_NAME FROM DBA_VIEWS WHERE OWNER = 'CAMS_BCE');
字段较长的字段长度直接修改为4000:
SELECT 'ALTER TABLE ' || OWNER || '.' || TABLE_NAME || ' MODIFY ("' ||
COLUMN_NAME || '" VARCHAR2(4000));'
FROM DBA_TAB_COLUMNS
WHERE OWNER = 'CAMS_BCE'
AND DATA_TYPE IN ('CHAR', 'VARCHAR2', 'NCHAR', 'NVARCHAR2')
AND DATA_LENGTH > 1
AND COLUMN_NAME not in('SYSDATE','DATE')
AND DATA_LENGTH >= 2666
AND TABLE_NAME NOT IN (SELECT VIEW_NAME FROM DBA_VIEWS WHERE OWNER = 'CAMS_BCE' );
AND TABLE_NAME NOT IN (SELECT VIEW_NAME FROM DBA_VIEWS WHERE OWNER = 'CAMS_BCE' );
也可直接通过存储过程执行:
(1)对于长度小于2666的,转化后小于varchar2最大长度4000
(1)对于长度小于2666的,转化后小于varchar2最大长度4000
--SET SERVEROUTPUT ON
--/
declare
sql_ varchar2(2000) ;
begin
for tab in ( select * FROM DBA_TAB_COLUMNS WHERE OWNER = 'CAMS_BCE' AND DATA_TYPE IN ('CHAR', 'VARCHAR2', 'NCHAR', 'NVARCHAR2') AND DATA_LENGTH > 1 AND COLUMN_NAME not in('SYSDATE','DATE') AND DATA_LENGTH < 2666 ) loop
select 'ALTER TABLE ' || tab.OWNER || '.' || tab.TABLE_NAME || ' MODIFY ("' || tab.COLUMN_NAME || '" VARCHAR2(' || CEIL(tab.DATA_LENGTH * 1.5) ||'))' into sql_ from dual ;
--dbms_output.put_line(sql_) ;
execute immediate sql_ ;
end loop ;
end ;
/
(2)对于长度大于2666的,转化后设置为varchar2最大长度4000
(2)对于长度大于2666的,转化后设置为varchar2最大长度4000
--SET SERVEROUTPUT ON
--/
declare
sql_ varchar2(2000) ;
begin
for tab in ( select * FROM DBA_TAB_COLUMNS WHERE OWNER = 'CAMS_BCE' AND DATA_TYPE IN ('CHAR', 'VARCHAR2', 'NCHAR', 'NVARCHAR2') AND DATA_LENGTH > 1 AND COLUMN_NAME not in('SYSDATE','DATE') AND DATA_LENGTH >= 2666 ) loop
select 'ALTER TABLE ' || tab.OWNER || '.' || tab.TABLE_NAME || ' MODIFY ("' || tab.COLUMN_NAME || '" VARCHAR2(4000))' into sql_ from dual ;
--dbms_output.put_line(sql_) ;
execute immediate sql_ ;
end loop ;
end ;
/
步骤6:
导入数据:
impdp system/123456@cams DIRECTORY=dpdata1 DUMPFILE=bce.dmp LOGFILE=import_bceData.log SCHEMAS=cams_bce CONTENT= DATA_ONLY
Processing object type SCHEMA_EXPORT/TABLE/TABLE_DATA
. . imported "CAMS_BCE"."BCEBNKINF" 66.87 MB 145869 rows
. . imported "CAMS_BCE"."BCEBIZAUTH" 32.39 MB 75261 rows
. . imported "CAMS_BCE"."BCETXNJNL" 1.756 MB 3246 rows
…………
. . imported "CAMS_BCE"."TSFTOTDTL" 0 KB 0 rows
. . imported "CAMS_BCE"."TSFUSRINF" 0 KB 0 rows
. . imported "CAMS_BCE"."USERS" 0 KB 0 rows
Job "SYSTEM"."SYS_IMPORT_SCHEMA_01" successfully completed at Tue Aug 23 16:03:24 2016 elapsed 0 00:00:13
现在字符集转换工作已完成,可去数据库中做简单检查。
GBK的那个库里居然有很多char类型字段里面存中文,玩得果然666
GBK的那个库里居然有很多char类型字段里面存中文,玩得果然666
转载于:http://blog.itpub.net/31394774/viewspace-2125226/
喜欢请赞赏一下啦^_^
微信赞赏
支付宝赞赏