Tuesday, October 18, 2016

Inserting Large Amount of Records into Database with SqlBulkCopy

Recently we came across an interesting situation analyzing performance hinge of a data importer implementation. The existing implementation of a data importer in the project i was working on, had individual sql insert queries for each data record to be inserted into a database table. Obviously it was a heavy operation to insert thousands of records into the database.

The good news is that there are some handy solutions exist to handle such situations with much more efficiency. Well, we mainly considered two approaches, to either use table valued parameters, and to use SQLBulkCopy class.  Microsoft SQL Server also includes a popular command-line utility named bcp for quickly bulk copying large files into tables or views in SQL Server databases.

The SqlBulkCopy class can be used to write data only to SQL Server tables. But the data source is not limited to SQL Server; any data source can be used, as long as the data can be loaded to a DataTable instance or read with a IDataReader instance.

We had data records on a text file, written into an specific csv like format. We were using EntityFramework-like data access layer implementation, hence the solution can be generalized to use EntityFramework as well.

We had to implement file data reader class to read the file data into an IEnumerable

private IEnumerable GetFileContents(String filePath))
{
   //read the file and other logic
   foreach (var dataLine in dataLines){
      //some other logic
      yeild return entityObject;
   }
}

Since there were no existing implementation of IDataReader which use enumerator as a data source, we had to do our own implementation. A sample code can be found at my gist https://gist.github.com/isurusndr/450cfd0b2b19cd92ca322685d648877d

Calling of SqlBulkCopy is straight forward.

public static void BulkCopy(IEnumerable items, string connectionString)
{
   var bulkCopy = new SqlBulkCopy(connectionString);
   var reader = new EnumerableDataReader(items);

   foreach (var column in reader.ColumnMappingList)
   {
       bulkCopy.ColumnMappings.Add(column.Key, column.Value);
   }
 
    bulkCopy.BatchSize = 10000;
    bulkCopy.BulkCopyTimeout = 1800;
    bulkCopy.DestinationTableName = "EntityTable";
    bulkCopy.WriteToServer(reader);
    bulkCopy.Close();
}

Keep in mind not to set lower values to BatchSize property of SqlBulkCopy. Higher the value of batch size, better the performance would be.

The performance improvement we got was fantastic. We could brought initial time of 8 minutes to insert a sample data set, to less than 30 seconds at the end.


No comments: