Please pay more attention to the character set of your database
Recently, the vendors tried to install their systems in our office. Only had the vendors experience on Chinese platform. Their Windows is Chinese version. Their database is Chinese version. At level of Windows, Microsoft provided excellent solution: regional setting to support Chinese softwares, but for the database, we need to pay 120% attention to the character set of datbase. Actually this is what I learned from the problems I faced.
Last week, both the vendors and I did not realize that the character set of database could be an issue, when I raised the request to create the database. I did not mention to the DBAs that we need a Chinese character set, therefore the DBAs created a database with UTF8 character set, which also support Chinese. When I found the character set is UTF8, I did not realize it could be a problem for pure Chinese softwares. When the vendors went to our office to install their softwares on our servers, we found a problem: the character sets are different, therefore the vendors could not restore their database into our server. Till then, we only thought about this could be resolved by a conversion during the restoring. By google and baidu, we did not find useful information, therefore we thought about a different way to import their database(including schema and data) into our server. We found that the tool “PL/SQL developer” has a function to export schema and data to sql script. After generating the scripts, we ran the schema script to create schema. Things seemed fine. When we tried to run the script of data, there was an error saying the size of a column is less than the size of data. We felt very strange on the error message, because we just generated the schema script and data script from the vendor’s database. How could the size of the column not fit the data? By thinking of the reasons, I suddenly call back a parameter the DBAs told me: the database is in UTF8 character set! A Chinese character in UTF8 code occupies 3 bytes, but A Chinese character in ZHS16GBK only needs 2 bytes. That is why the column could not store the data.
Finally the vendors and I came to the same conclusion: we had to change the character set of our database. After changing the character set, we then proceed installation.
Record key thing here: The UTF8 code is looking like this: AMERICAN_AMERICA.UTF8. The Chinese GBK code is : Simplified Chinese_China.ZHS16GBK.
For playing back when I forgot it.