How I can obtain the collation of a specific table in a database?

回答1

Collation at the table level is on a per column basis, so it is possible to have a collation different than the database. If the collation is not defined at the column level, it defaults to the database collation setting.

SQL Server 2000:

SELECT c.name, 
       c.collation 
  FROM SYSCOLUMNS c
 WHERE [id] = OBJECT_ID('your_table_name')

SQL Server 2005+:

SELECT c.name, 
       c.collation_name
  FROM SYS.COLUMNS c
  JOIN SYS.TABLES t ON t.object_id = c.object_id
 WHERE t.name = 'your_table_name'

获取数据库默认的collation

SELECT CONVERT (varchar(256), SERVERPROPERTY('collation'));

Temp Table collation conflict - Error : Cannot resolve the collation conflict between Latin1* and SQL_Latin1*

回答1

This happens because the collations on #tempdb.temp_po.OD1 and STR_IndentDetail.D1 are different (and specifically, note that #tempdb is a different, system database, which is generally why it will have a default opinion for collation, unlike your own databases and tables where you may have provided more specific opinions).

Since you have control over the creation of the temp table, the easiest way to solve this appears to be to create *char columns in the temp table with the same collation as your STR_IndentDetail table:

CREATE TABLE #temp_po(
    IndentID INT, 
    OIndentDetailID INT, 
    OD1 VARCHAR(50) COLLATE SQL_Latin1_General_CP1_CI_AS, 
    .. Same for the other *char columns

In the situation where you don't have control over the table creation, when you join the columns, another way is to add explicit COLLATE statements in the DML where errors occur, either via COLLATE SQL_Latin1_General_CP1_CI_AS or easier, using COLLATE DATABASE_DEFAULT

SELECT * FROM #temp_po t INNER JOIN STR_IndentDetail s 
   ON t.OD1 = s.D1 COLLATE SQL_Latin1_General_CP1_CI_AS;

OR, easier

SELECT * FROM #temp_po t INNER JOIN STR_IndentDetail s 
   ON t.OD1 = s.D1 COLLATE DATABASE_DEFAULT;

SqlFiddle here

Collations and Case Sensitivity

Database collation

In most database systems, a default collation is defined at the database level; unless overridden, that collation implicitly applies to all text operations occurring within that database. The database collation is typically set at database creation time (via the CREATE DATABASE DDL statement), and if not specified, defaults to a some server-level value determined at setup time. For example, the default server-level collation in SQL Server for the "English (United States)" machine locale is SQL_Latin1_General_CP1_CI_AS, which is a case-insensitive, accent-sensitive collation. Although database systems usually do permit altering the collation of an existing database, doing so can lead to complications; it is recommended to pick a collation before database creation.

When using EF Core migrations to manage your database schema, the following in your model's OnModelCreating method configures a SQL Server database to use a case-sensitive collation:

modelBuilder.UseCollation("SQL_Latin1_General_CP1_CS_AS");

Column collation

Collations can also be defined on text columns, overriding the database default. This can be useful if certain columns need to be case-insensitive, while the rest of the database needs to be case-sensitive.

When using EF Core migrations to manage your database schema, the following configures the column for the Name property to be case-insensitive in a database that is otherwise configured to be case-sensitive:

modelBuilder.Entity<Customer>().Property(c => c.Name)
    .UseCollation("SQL_Latin1_General_CP1_CI_AS");

SQL Server Collation介绍及其变更对数据的影响

字符的存储

在将collation之前，我们首先需要知道字符是如何被存储的。在计算机中，所有数据都是用0和1这样的位来描述。一个字节有8位，因此一个字节最多可以描述256个字符。在欧美国家，比如美国，他们的文字字符主要就是26个字母加上一些特殊符号(+-*/等)，用一个字节就可以存储，一个国家使用的所有字符就是一个code page，用一个字节存储字符的code page 叫做single-byte code page。但是在亚洲的一些国家，比如中国，常用汉字有几万个，根本不能用一个字节来表示所有的汉字字符，因此需要用两个字节描述。因为两个字节有16位，最多可以描述65536个字符，足够用来描述所有汉语字符以及常用字符，这些字符也是一个code page，不过是double-byte code page，主要针对的是中国。每个国家都有一个code page来对应所使用的字符。比如欧美国家，他们使用拉丁，虽然a-z这26个字母所对应的二进制在code page中是相同的，但是在重音（'é'和'á'）方面是不相同的。所以如果code page不同，那么相同的二进制代码所表示出来的字符也可能不相同。如果数据在不同code page的计算机上传输，就需要进行code page的转换，如果接收方的code page上没有定义传输方传送过来的特定字符的二进制位，那么就会出现数据丢失。

Collation的组成

在讲完字符的存储形式以后，我们就可以讲什么是collation了。Collation描述了数据在数据库中是按照什么规则来描述字符，以及字符时如何被排序和比较的。在SQL Server中，Collation由两部分组成，比如中国的一个collation是 Chinese_PRC_CI_AI_WS ，前半部份是指的是所支持的字符集，与code page相对应，如Chinese_PRC 对应的代码页是936，在这个code page中定义了所有能够使用的字符。后半部CI_AI_WS用于表示排序规则，比如：

_CI(CS)表示是否区分字母大小写，CI不区分，CS区分。如果区分大小写，那么排序的时候小写字母的排在大写的前面；如果不区分大小写，那么排序的时候视大小写字母相同。
_AI(AS) 表示是否区分重音，AI不区分，AS区分。如果不区分重音，那么排序的时候视“a”和“ấ”为相同字符
_KI(KS) 表示是否区分假名类型，KI不区分，KS区分。在日语中应用。
_WI(WS) 表示是否区分全半角，WI不区分，WS区分。半角是单字节，全角是双字节。

Collation的四个级别

Collation一共有四个级别，分别是server-level, database-level, column-level和expression-level。

posted @ 2022-11-14 14:09 ChuckLu 阅读(47) 评论(0) 收藏举报

刷新页面返回顶部

Chuck Lu

How I can obtain the collation of a specific table in a database?

How I can obtain the collation of a specific table in a database?

Temp Table collation conflict - Error : Cannot resolve the collation conflict between Latin1* and SQL_Latin1*

Collations and Case Sensitivity

Database collation

Column collation

SQL Server Collation介绍及其变更对数据的影响

字符的存储

Collation的组成

Collation的四个级别

公告