SQL Server collation introduction with collate SQL casting
SQL Server collation refers to a set of character and character encoding rules, and influences how information is stored according to the order in the data page, how data is matched by comparing two columns, and how information is arranged in the T-SQL query statement. Collate SQL follows rules applied on a table when Select, Insert, Update or Delete operations are performed against contained data. Data always follows collation constraint rules, which are configured when creating an object.
When retrieving data using a T-SQL query, collation plays a fundamental role in the execution. It matters which collation is associated with a column when ordering clause is applied to that column. For example, information in the column can be case sensitive, and lowercase letters will not be treated the same as uppercase letters. This distinction is important for data sorting in the query result set.
SQL Server installation configuration sets the default collation for the created instance (Latin1_General_CI_AI). New databases will be configured with the default collation of the SQL Server. Users can change the collation settings at the database level but not at the SQL Server level. With the assistance of the SQL Server Installation setup, collation can be changed, however, the effect of that on the current database and its tables can be unpredictable. This is not recommended since it can influence existing query execution and query result sets.
A list of collations supported by the installed version of SQL Server can be generated using sys.fn_helpcollations() (a system DMV). SQL Server allows collations to be configured at different levels of the database engine, but, by default, every level will inherit the collation settings from the parent level. In descending order, these are:
- SQL Server instance
- Database
- Column
- Query
SQL Server instance level collation
A SQL Server instance collation is the default collation for the system databases. It includes master, tempdb and model. If a user’s database collation is configured differently than one of the SQL Server instance collation, then a comparison between tables in that database and in tempdb, when it comes to textual data comparison, will return an error. Tempdb will always be configured with the instance level default collation. If one of the user databases is restored with other than tempdb’s collation and the temp table is used in any T-SQL statement or procedure inside a user database that compares text-based columns using WHERE clause or JOIN conditions, the query will return an error because both of the tables do not have the same collation.
To find the SQL Server instance level collation use, the T-SQL shown below:
1
|
SELECT SERVERPROPERTY('collation') AS ServerCollation;
|
To handle a collation violation, a developer can make a small change in the T-SQL by defining collation on a column. This will be explained in the “Query level collation” section.
Database collation
Default database collation is inherited from SQL Server instance collation and used for database restoration. Collation is copied in backup when the backup is created. To find the current database collation, use the T-SQL shown below:
1
2
|
SELECT name, collation_name DbCollation
FROM sys.databases
|
If the default collation of a database is changed by a user, then the collation of existing, user-defined table-columns won’t be changed. However, new tables will be characterized by the changed database collation. But, if tables with various collations exist in the database, there can be errors while creating a join between them. To apply new collation to existing tables of a database, the user can set new collation on TEXT columns of tables. This activity will place a lock on the tables because the column will be altered with the new collation. Therefore, the existing column index will be dropped and re-created automatically.
Column level collation
The default collation for SQL Server user-defined column is inherited from the database configured collation. If the database collation is changed, then the existing columns will retain the original collation but new columns will be created with current database collation.
This T-SQL can help find the collation for a particular column:
1
2
3
4
5
|
SELECT OBJECT_NAME(OBJECT_ID), name ColumnName, collation_name AS ColumnCollation
FROM sys.columns
WHERE collation_name IS NOT NULL
AND OBJECT_NAME(OBJECT_ID) = '?'
AND name = '?'
|
The image below shows collation type as a column property when navigated through Object explorer and displayed from the Property option of the context menu:
Query level collation
SQL Server doesn’t support automatic collation casting when executing a query with a comparison of two columns with different collations. As mentioned above, collation is consistent with parent levels. In these situations with various UNICODE characters, COLLATE casting should be utilized.
Query level collation is useful when a user wants to sidestep the collation of the column in a specific T-SQL query statement. It is possible to collate SQL by adding a collation name with the COLLATE keyword in the expression. The COLLATE is a column collation casting keyword which is operation generally known as collate SQL. COLLATE can also be used with the column name and input character string.
1
2
3
|
WHERE [Column X] = [Column Y] COLLATE Latin1_General_CI_AS
or
ON [Column X] = [Column X] COLLATE Latin1_General_CS_AS
|
A common use of the query level collation (collate SQL) is to compare case-sensitive strings. For example, imagine two tables with similar collation and compare their columns using join or subquery. Data with the lower-case string exists in the chosen columns of those tables. The requirement is to compare both columns in case-sensitive terms, but both columns don’t have the same collation. Collation at the query level (collate SQL) will allow overriding the database or column-level collation when running a query.
Collation conflicts
SQL Server supports multiple collations but cannot properly deal with collation mismatches. If a user tries to compare fields with different collations, SQL Server will return an error with collation names as shown below:
Msg 468, Level 16, State 9, Line 9
Cannot resolve the collation conflict between “Latin1_General_CI_AI” and “SQL_Latin1_General_CP1_CI_AS” in the like operation.
To avoid conflicts, users can add a default collation setting in the T-SQL query statement:
1
2
|
COLLATE DATABASE_DEFAULT
WHERE Column1 COLLATE DATABASE_DEFAULT = Column2
|
This may prompt an issue when programming for SQL Server. Users can resolve this issue by utilizing a SQL Server column compare clause with COLLATE as described above. In a situation where SQL Server is recently installed and an inappropriate collation setting is identified, it can cause an issue with the utilization of tempdb. As previously mentioned, tempdb is made with default server-level collation when restarting the SQL Server service.
Use collate SQL in T-SQL JOIN
COLLATE SQL is a very effective column casting function in the SQL Server but it can cause query performance issues. COLLATE SQL can help a developer with a quick fix for a collation error on a Remote Query execution. Imagine a table with a column collation of Latin1_General_CS_AS and another table with a column that has a different collation. When joining the two tables based on columns with the differing collations:
1
2
3
|
SELECT *
FROM TABLE1
INNER JOIN TABLE2 ON TABLE1.Col1 COLLATE Latin1_General_CS_AS = TABLE2.Col1 COLLATE Latin1_General_CS_AS
|
Here, we have utilized COLLATE with both the columns in a JOIN articulation to make a common collation to avoid an error. In summary, the COLLATE keyword with its collation value as argument is used next to a column name when ad hoc collation resolution is required.
SQL Server collation is a major aspect of the database engine. A mistake in collation configuration during SQL Server installation can cause huge issues during query execution. Although there is a workaround to resolve issues with collate SQL casting, it is always recommended that for a new SQL Server set up, server migration or database migration, collation should be at the top of the activity checklist. The choice of collation should be carefully thought through and consider future integration and implementation needs. It’s especially important in the event that you’ll need to join tables with other databases.