SqlBulkCopy学习笔记
在日常开发过程中,不同数据源(SQL Server, SQLite, Oracle, 等等)之间转移是再普通不过的开发任务了。特别是在做大批量数据插入的时候,如果用 Insert into ... values (...)这种方式的话效率极低,因为其要不断的打开关闭数据连接,性能的损失可想而知。
SqlBulkCopy小露手脚
Microsoft SQL Server 提供一个称为 bcp的流行的命令提示符实用工具,用于将数据从一个表移动到另一个表(表既可以在同一个服务器上,也可以在不同服务器上。比如下面的例子是利用bcp工具将数据行复制到数据文件中,
bcp "SELECT Name FROM AdventureWorks.Sales.Currency" queryout Currency.Name.dat -T -c
SqlBulkCopy 类允许编写提供类似功能的托管代码解决方案。还有其他将数据加载到 SQL Server 表的方法(例如 INSERT 语句),但相比之下 SqlBulkCopy 提供明显的性能优势。使用 SqlBulkCopy 类只能向 SQL Server 表或视图写入数据。但是,数据源不限于 SQL Server,可以使用任何数据源,只要数据可加载到 DataTable 实例或可使用 IDataReader 实例读取数据。
在详细说明这个类的用法前,我们先简单概括这个类的功能:
- 单独批量复制操作,可将数据从一个数据源移动到SQL Server表中。
- 也可执行多个批量复制操作。
- 在数据库事务中可执行批量复制操作。
我们先来看一个简单代码:
SqlConnection sqlConn = new SqlConnection(ConfigurationManager.ConnectionStrings["conn"].ConnectionString); SqlBulkCopy bulkCopy = new SqlBulkCopy(sqlConn); bulkCopy.DestinationTableName = "BulkTestTable"; bulkCopy.BatchSize = dt.Rows.Count; try { sqlConn.Open(); if (dt != null && dt.Rows.Count != 0) bulkCopy.WriteToServer(dt); } catch (Exception ex) { throw ex; } finally { sqlConn.Close(); if (bulkCopy != null) bulkCopy.Close(); }
上面的这个例子很直观地说明了使用SqlBulkCopy类时几个重要的元素。
- 最重要的属性为Destination TableName
- 最重要的方法为WriteToServer。
DestinationTableName属性指定接受复制记录的表。由于它有一个由三部分组成的名称(database.owningschema.name),因此它遵循SQL Server语法。你可以用它的数据库和所有模式来限制表名称。数据库通过在连接字符串中指定(通过Initial Catalog值)。另外,如果表名称使用一个下划线或任何其它特殊的字符,你必须使用方括号避免这类名称,如[database.owningschema.name]。
实际上由这个超载的WriteToServer方法执行批量复制。它接受DataTable、DataRow和IDataReader对象为复制数据源。你还可以用DataTable对象包含一个DataRowState值,指定仅复制匹配的行。
SqlBulkCopy性能测试
首先,我们在SQL Server创建一个如下结构的表:
id | identity(1,1) |
username | varchar(50) |
password | varchar(50) |
可以直接复制下面的SQL语句来创建:
CREATE TABLE test_bulkcopy( id int IDENTITY(1,1) NOT NULL, username varchar(50) NULL, password varchar(50) NULL )
数据创建好了,我们开始写测试代码,要测试SqlBulkCopy类的性能,我们可以通过根其他几种方法的性能对比来说明。
目前向数据库表种写入大量数据的方法有以下几种:
- 最原始的方法:一条一条插入
- 利用SqlBulkCopy类来向Sql表复制数据
- 使用表值参数(Table_Value Parameters)直接传递表给存储过程或SQL命令(只有SQL Server2008以后的版本支持)
下面我们通过一个Console工程来实现一个三种方法的性能测试程序:
AbstractCopy.cs
namespace InsideDotNet.SqlBulkCopy { using System; using System.Configuration; using System.Data; using System.Data.SqlClient; public abstract class AbstractCopy { /// <summary> /// Connection to database /// </summary> public SqlConnection SqlConn = new SqlConnection(ConfigurationManager.ConnectionStrings["storydb"].ConnectionString); /// <summary> /// Copy specific rows records to database /// </summary> /// <param name="totalCount">Records count, unit by 100K</param> public abstract void CopyToDb(int totalCount); public void PrintBasicInfo(int totalCount) { Console.WriteLine(String.Format("Copying data to database with {0}, records: {1}00k", this.GetType(), totalCount)); } /// <summary> /// Get table schema /// </summary> /// <returns></returns> public DataTable GetTableSchema() { DataTable dt = new DataTable(); dt.Columns.AddRange(new[] { new DataColumn("id",typeof(int)), new DataColumn("username",typeof(string)), new DataColumn("password",typeof(string))}); return dt; } } }
NormalCopy.cs
namespace InsideDotNet.SqlBulkCopy { using System; using System.Data; using System.Data.SqlClient; using System.Diagnostics; public class NormalCopy: AbstractCopy { /// <summary> /// Copy specific rows records to database /// </summary> /// <param name="totalCount">Records count, unit by 100K</param> public override void CopyToDb(int totalCount) { this.PrintBasicInfo(totalCount); Stopwatch sw = new Stopwatch(); SqlCommand sqlComm = new SqlCommand(); sqlComm.CommandText = string.Format("insert into test_bulkcopy(id,username,password) values(@p0,@p1,@p2)");//参数化SQL sqlComm.Parameters.Add("@p0", SqlDbType.Int); sqlComm.Parameters.Add("@p1", SqlDbType.NVarChar); sqlComm.Parameters.Add("@p2", SqlDbType.VarChar); sqlComm.CommandType = CommandType.Text; sqlComm.Connection = SqlConn; SqlConn.Open(); try { //Insert 100k records each time for (int multiply = 0; multiply < totalCount; multiply++) { for (int count = multiply * 100000; count < (multiply + 1) * 100000; count++) { sqlComm.Parameters["@p0"].Value = count; sqlComm.Parameters["@p1"].Value = string.Format("User-{0}", count * multiply); sqlComm.Parameters["@p2"].Value = string.Format("Pwd-{0}", count * multiply); sw.Start(); sqlComm.ExecuteNonQuery(); sw.Stop(); } //每插入10万条数据后,显示此次插入所用时间 Console.WriteLine(String.Format("Elapsed Time is {0} Milliseconds", sw.ElapsedMilliseconds)); } } catch(Exception ex) { throw ex; } finally { SqlConn.Close(); } } } }
BulkCopy.cs
namespace InsideDotNet.SqlBulkCopy { using System; using System.Data; using System.Data.SqlClient; using System.Diagnostics; /// <summary> /// Description of SqlBulkCopyHelper /// </summary> public class BulkCopy : AbstractCopy { /// <summary> /// Copy specific rows records to database /// </summary> /// <param name="totalCount">Records count, unit by 100K</param> public override void CopyToDb(int totalCount) { this.PrintBasicInfo(totalCount); Stopwatch sw = new Stopwatch(); for (int multiply = 0; multiply < totalCount; multiply++) { DataTable dt = GetTableSchema(); //Insert 100k records each time for (int count = multiply * 100000; count < (multiply + 1) * 100000; count++) { DataRow r = dt.NewRow(); r[0] = count; r[1] = string.Format("User-{0}", count * multiply); r[2] = string.Format("Pwd-{0}", count * multiply); dt.Rows.Add(r); } sw.Start(); BulkCopyToDb(dt); sw.Stop(); Console.WriteLine(string.Format("Elapsed Time is {0} Milliseconds", sw.ElapsedMilliseconds)); } } private void BulkCopyToDb(DataTable source) { SqlBulkCopy bulkCopy = new SqlBulkCopy(SqlConn); bulkCopy.DestinationTableName = "test_bulkcopy"; bulkCopy.BatchSize = source.Rows.Count; try { SqlConn.Open(); if (source.Rows.Count != 0) bulkCopy.WriteToServer(source); } catch (Exception ex) { throw ex; } finally { SqlConn.Close(); bulkCopy.Close(); } } } }
TableValuedCopy.cs
namespace InsideDotNet.SqlBulkCopy { using System; using System.Data; using System.Data.SqlClient; using System.Diagnostics; /// <summary> /// Copy data via table-valued parameters, notice that this manner could only be supported by SQL Server R2 /// </summary> public class TableValuedCopy : AbstractCopy { /// <summary> /// Copy specific rows records to database /// </summary> /// <param name="totalCount">Records count, unit by 100K</param> public override void CopyToDb(int totalCount) { this.PrintBasicInfo(totalCount); Stopwatch sw = new Stopwatch(); for (int multiply = 0; multiply < totalCount; multiply++) { DataTable dt = GetTableSchema(); //Insert 100k records each time for (int count = multiply * 100000; count < (multiply + 1) * 100000; count++) { DataRow r = dt.NewRow(); r[0] = count; r[1] = string.Format("User-{0}", count * multiply); r[2] = string.Format("Pwd-{0}", count * multiply); dt.Rows.Add(r); } sw.Start(); TableValuedToDb(dt); sw.Stop(); Console.WriteLine(String.Format("Elapsed Time is {0} Milliseconds", sw.ElapsedMilliseconds)); } } private void TableValuedToDb(DataTable dt) { const string SqlStatement = "insert into test_bulkcopy (id,username,password)" + " select nc.id, nc.username,nc.password" + " from @TvpBulkTest as nc"; SqlCommand cmd = new SqlCommand(SqlStatement, SqlConn); SqlParameter catParam = cmd.Parameters.AddWithValue("@TvpBulkTest", dt); catParam.SqlDbType = SqlDbType.Structured; //表值参数的名字叫BulkUdt,在上面的建立测试环境的SQL中有。 catParam.TypeName = "dbo.tvpBulk"; try { SqlConn.Open(); if (dt != null && dt.Rows.Count != 0) { cmd.ExecuteNonQuery(); } } finally { SqlConn.Close(); } } } }
第三种方法中用到的表值参数可以使用下面的T-SQL命令来创建:
-- ================================ -- Create User-defined Table Type -- ================================ USE Story GO -- Create the data type CREATE TYPE dbo.tvpBulk AS TABLE ( id int NULL, username varchar(50) NULL, password varchar(50) NULL ) GO
这们通过Client.cs中调用这三种方法来进行数据插入,我们得到了如下的结果,
namespace InsideDotNet.SqlBulkCopy { using System; class Client { static void Main(string[] args) { AbstractCopy testCopy = null; // Testing copy 1000K records //testCopy = new NormalCopy(); //testCopy.CopyToDb(10); //// Testing copy 1000K records //testCopy = new BulkCopy(); //testCopy.CopyToDb(10); // Testing copy 1000K records testCopy = new TableValuedCopy(); testCopy.CopyToDb(10); Console.ReadKey(); } } }
一条一条插入的执行效率:
Copying data to database with InsideDotNet.SqlBulkCopy.NormalCopy, records: 1000k Elapsed Time is 38550 Milliseconds Elapsed Time is 78068 Milliseconds Elapsed Time is 121444 Milliseconds Elapsed Time is 161123 Milliseconds Elapsed Time is 205285 Milliseconds Elapsed Time is 247195 Milliseconds Elapsed Time is 285696 Milliseconds Elapsed Time is 324239 Milliseconds Elapsed Time is 363028 Milliseconds Elapsed Time is 400923 Milliseconds
使用SqlBulkCopy的执行效率:
Copying data to database with InsideDotNet.SqlBulkCopy.BulkCopy, records: 1000k Elapsed Time is 989 Milliseconds Elapsed Time is 1671 Milliseconds Elapsed Time is 2320 Milliseconds Elapsed Time is 2873 Milliseconds Elapsed Time is 3476 Milliseconds Elapsed Time is 4045 Milliseconds Elapsed Time is 4780 Milliseconds Elapsed Time is 5449 Milliseconds Elapsed Time is 5868 Milliseconds Elapsed Time is 6351 Milliseconds
使用表值参数的执行效率:
Copying data to database with InsideDotNet.SqlBulkCopy.TableValuedCopy, records:1000k Elapsed Time is 602 Milliseconds Elapsed Time is 1204 Milliseconds Elapsed Time is 1676 Milliseconds Elapsed Time is 2313 Milliseconds Elapsed Time is 2983 Milliseconds Elapsed Time is 3545 Milliseconds Elapsed Time is 4122 Milliseconds Elapsed Time is 4733 Milliseconds Elapsed Time is 5430 Milliseconds Elapsed Time is 6197 Milliseconds
很明显的是一条一条插入效率最低下,因为要不断的打开关闭连接。SqlBulkCopy类和表值参数的效率相当,都远远高于传统一条一条插入的方式。
不过,要说明的是目前SqlBulkCopy和表值参数只支持从各种数据源复制大量数据到Sql Server数据库表或视图,对其他数据库是不支持的,另外,表值参数是在Sql Server 2008中引入的,所以对于之前的版本不适用。