Friday, February 10, 2012

Bulk Insert to a SQL Server Table with SqlBulkCopy

System.Data.SqlClient.SqlBulkCopy class lets us efficiently bulk load a SQL Server table with a significant performance advantage over other methods. Here we can use any data source as long as the data can be loaded to a DataTable instance or read with an IDataReader instance.

How SqlBulkCopy works is it sends large batches of records to the database rather than sending one record at a time. The number of records sent by SqlBulkCopy in each batch can be set by BatchSize property, which is by default zero and send all records to be loaded in one large batch.

When we are dealing with hundreds of thousands of records it is better to use smaller batches. The optimal batch size depends on factors like network bandwidth and the amount of bytes per record. If there is an error then the entire batch is rolled back and the error would cause SqlBulkCopy to stop processing. But any batches complete before the error is committed, meaning that the database may contains some imported data.

SqlBulkCopy class can be used to data insert only.

Example: Destination Table








Create an instance of the SqlBulkCopy class and specify the DB connection and the destination table to import data.

public static void SaveBulk(DataTable dataTable)
{
  SqlConnection connection = null;
  if (dataTable != null)
  {
    try
    {
      using (connection = new SqlConnection(connectionString))
      {
        connection.Open();
        using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection))
        {
          bulkCopy.ColumnMappings.Add("AccountId", "AccountId");
          bulkCopy.ColumnMappings.Add("ProcessingDate", "ProcessingDate");
          bulkCopy.ColumnMappings.Add("Amount", "Amount");

          bulkCopy.DestinationTableName = "dbo.AccountData";

          bulkCopy.WriteToServer(dataTable);
        }
      }
    }
    catch (Exception ex)
    {
      LoggerService.HandleException(ex);
    }
  }
}

It is not a must to specify column mappings in SqlBulkCopy. If column mappings are not specified, SqlBulkCopy performs the import by mapping the first column of the source data to the first column of the destination table, and so on. It does not perform the mapping by column name. So if the column order is different or we want to do the mapping clearly, we can optionally specify column mappings indicating what columns in the source data are mapped to what columns in the destination table.

Further Performance Improvements 
 
1. Tablelock
By defaults SqlBulkCopy uses row locks during the bulk copy operation. If TableLock option in the SqlBulkCopyOptions is specified, it obtains a bulk update lock for the duration of the bulk copy operation and performance will be improved.

using (SqlBulkCopy bulkCopy = new SqlBulkCopy(connection, SqlBulkCopyOptions.TableLock, null))
{
}

2. Retrieve all data before calling WriteToServer method
In some scenarios it is possible that the import process involves steps as follows:
  1. Retrieve data from a table
  2. Loop and process retrieved data
    1. Within the loop bring additional data based on the current item
    2. At the end of the iteration bulk copy to save all processed data

In this case if we can retrieve all the data required for processing and store in a list, before start the loop (Actually before calling WriteToServer); we can expect consistent performance in the last iterations in the loop as well. Otherwise what will happen is our get queries will get queued and will take longer time to response because of the writing load on the SQL server.

A paper on SqlBulkCopy Performance Analysis, which describes how to reach top performance using the SqlBulkCopy can be found here.

No comments: