Split string in SQL Server 2005+ CLR vs. T-SQL(转载)
来源:
原文内容:
I'm editing in a link to Adam Machanic's blog on this. In the comments on this topic here you will see there are imperfections found in my methods. Reading Adam's blog shows this in more detail.
http://sqlblog.com/blogs/adam_machanic/archive/2009/04/26/faster-more-scalable-sqlclr-string-splitting.aspx
Thanks Adam!
I wrote this short CLR split function awhile back based on a few other articles I read when 2005 was released. I decided to play with it today and see if I could put it with the SQL split solutions.
Let's get the basics out of the way on SQL CLR. SQL CLR is only good once it's in memory. The CLR function split basically won over the T-SQL split functions after it was cached. This is a critical variable to consider when thinking CLR vs. T-SQL options on coding. If you are doing heavy manipulation of data and heavy math, CLR will typically help you, but you should be very careful with CLR and memory management. You can run your server resources out and literally stop functionality. I highly recommend reading MrDenny's blog on CLR here. Denny touches on important topics on when to use CLR and why you shouldn't. After that, look into how SQL Server 32bit, 32bit with AWE and 64bit versions handle memory. Each handles memory differently. AWE enalbed instances will probably be the one that will cause you more headaches then the rest. I had severe memory issues a few months ago on a production database server that forced restarts nightly until I fixed the problem. I analyzed the problem and it came to be several factors that caused it and SQL CLR memory was one of those factors. Here is my chance to thank mrdenny and ptheriault again for the assisatnce on that strange problem.
I went out and google'd "fast split function t-sql". Found a few and tested them against the CLR split method. I found a dozen or so split functions that looked good. I still went with a numbers table one after testing them out next to each other. Here is one of the functions I used. If you have a better one, post it in the comments and I can edit the post.
- ALTER FUNCTION [dbo].[Split] (
- @List VARCHAR(7998), --The delimited list
- @Del CHAR(1) = ',' --The delimiter
- )
- RETURNS @T TABLE (Item VARCHAR(7998))
- AS
- BEGIN
- DECLARE @WrappedList VARCHAR(8000), @MaxItems INT
- SELECT @WrappedList = @Del + @List + @Del, @MaxItems = LEN(@List)
- INSERT INTO @T (Item)
- SELECT SUBSTRING(@WrappedList, Number + 1, CHARINDEX(@Del, @WrappedList, Number + 1) - Number - 1)
- FROM dbo.Numbers n
- WHERE n.Number <= LEN(@WrappedList) - 1
- AND SUBSTRING(@WrappedList, n.Number, 1) = @Del
- RETURN
- END
Here is my CLR split
- using System;
- using System.Data;
- using System.Collections;
- using System.Data.SqlClient;
- using System.Data.SqlTypes;
- using Microsoft.SqlServer.Server;
- public partial class UserDefinedFunctions
- {
- [SqlFunction(Name = "CLR_Split",
- FillRowMethodName = "FillRow",
- TableDefinition = "id nvarchar(10)")]
- public static IEnumerable SqlArray(SqlString str, SqlChars delimiter)
- {
- if (delimiter.Length == 0)
- return new string[1] { str.Value };
- return str.Value.Split(delimiter[0]);
- }
- public static void FillRow(object row, out SqlString str)
- {
- str = new SqlString((string)row);
- }
- };
I loaded a text file with a huge amount of delimited data to really get a gauge on time this would take. The string is basically, "data%data%data%data%data" and on. Around 600 indexes. I restarted my local instance of SQL Server 2005 that I did these on to ensure you can see CLR before cache and after.
So on a fresh restart you can see by checking type MEMORYCLERK_SQLCLR from dm_os_memory_clerks that we are clear

After execution of the CLR function you can see the differences

Call these functions as
- DECLARE @bigdamnvar VARCHAR(MAX)
- SET @bigdamnvar = (SELECT * FROM OPENROWSET ('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)};DBQ=C:\', 'SELECT * from Data.txt'))
- SELECT * FROM dbo.CLR_Split(@bigdamnvar, '%')
Below you can see first execution and then how quickly performance picks up on the second cached plan

and
- DECLARE @bigdamnvar VARCHAR(MAX)
- SET @bigdamnvar = (SELECT * FROM OPENROWSET ('MSDASQL', 'Driver={Microsoft Text Driver (*.txt; *.csv)};DBQ=C:\', 'SELECT * from Data.txt'))
- SELECT * FROM dbo.[Split](@bigdamnvar, '%')
I executed this a few times to get it in cache as well. The odd increase was the server working on something else. I validated that so ignore it.

You may notice the bytes recieved from the server are different and SQL CLR is much heavier. That is something to keep in mind. SQL will always be light compared to SQL CLR. That is my experience in using the two side by side.
The differences are small but if the task you intend to perform is something cached typically, keep CLR in mind. Mostly when it come to complicated tasks that may be left to other languages you have available.
【推荐】国内首个AI IDE,深度理解中文开发场景,立即下载体验Trae
【推荐】编程新体验,更懂你的AI,立即体验豆包MarsCode编程助手
【推荐】抖音旗下AI助手豆包,你的智能百科全书,全免费不限次数
【推荐】轻量又高性能的 SSH 工具 IShell:AI 加持,快人一步
· go语言实现终端里的倒计时
· 如何编写易于单元测试的代码
· 10年+ .NET Coder 心语,封装的思维:从隐藏、稳定开始理解其本质意义
· .NET Core 中如何实现缓存的预热?
· 从 HTTP 原因短语缺失研究 HTTP/2 和 HTTP/3 的设计差异
· 周边上新:园子的第一款马克杯温暖上架
· 分享 3 个 .NET 开源的文件压缩处理库,助力快速实现文件压缩解压功能!
· Ollama——大语言模型本地部署的极速利器
· DeepSeek如何颠覆传统软件测试?测试工程师会被淘汰吗?
· 使用C#创建一个MCP客户端