The good news is that there are some handy solutions exist to handle such situations with much more efficiency. Well, we mainly considered two approaches, to either use table valued parameters, and to use SQLBulkCopy class. Microsoft SQL Server also includes a popular command-line utility named bcp for quickly bulk copying large files into tables or views in SQL Server databases.
The SqlBulkCopy class can be used to write data only to SQL Server tables. But the data source is not limited to SQL Server; any data source can be used, as long as the data can be loaded to a DataTable instance or read with a IDataReader instance.
We had data records on a text file, written into an specific csv like format. We were using EntityFramework-like data access layer implementation, hence the solution can be generalized to use EntityFramework as well.
We had to implement file data reader class to read the file data into an IEnumerable
private
IEnumerable GetFileContents(String filePath))
{
//read the file and other logic
foreach
(var dataLine
in
dataLines){
//some other logic
yeild return entityObject;
}
}
Since there were no existing implementation of IDataReader which use enumerator as a data source, we had to do our own implementation. A sample code can be found at my gist https://gist.github.com/isurusndr/450cfd0b2b19cd92ca322685d648877d
Calling of SqlBulkCopy is straight forward.
public
static
void
BulkCopy(IEnumerable items,
string
connectionString
)
{
var bulkCopy =
new
SqlBulkCopy(connectionString);
var reader =
new
EnumerableDataReader(items);
foreach
(var column
in
reader.ColumnMappingList)
{
bulkCopy.ColumnMappings.Add(column.Key, column.Value);
}
bulkCopy.BatchSize = 10000;
bulkCopy.BulkCopyTimeout = 1800;
bulkCopy.DestinationTableName =
"EntityTable"
;
bulkCopy.WriteToServer(reader);
bulkCopy.Close();
}
Keep in mind not to set lower values to BatchSize property of SqlBulkCopy. Higher the value of batch size, better the performance would be.
The performance improvement we got was fantastic. We could brought initial time of 8 minutes to insert a sample data set, to less than 30 seconds at the end.