Importing data from other sources is a very common operation with most of database solutions as not all data cannot be entered row-by-row. When large amount of data needs to be imported, we always consider about the constraints added to the table and minimal logging expecting improved performance.
SQL Server supports a set of tools for importing data: Import and Export Wizard, bcp (Bulk Copy Program), BULK INSERT, OPENROWSET (BULK). Preference always goes to either bcp or BULK INSERT, however, to improve the performance of it, some of the options have to be considered and set. This post explains how to load a large amount of data using bcp with various options.
For testing the best way, I created a database called TestDatabase and a table called dbo.Sales. I prepared a text file called Sales.txt that has 7,301,921 records matching with dbo.Sales table structure. The Recovery Model of the database is initial set as Full.
Then I loaded the table with following bcp commands. Note that each and every command was run after truncating and shrinking the log file.
--Loding without any specific option bcp TestDatabase.dbo.Sales in C:\Temp\Sales.txt -S DINESH-PC\SQL2014,8527 -T -c --Loding with TABLOACK option for forcing to lock the entire table bcp TestDatabase.dbo.Sales in C:\Temp\Sales.txt -S DINESH-PC\SQL2014,8527 -T -c -h "TABLOCK" --Loding with our own batch size bcp TestDatabase.dbo.Sales in C:\Temp\Sales.txt -S DINESH-PC\SQL2014,8527 -T -c -b 3000000 --Loding with our own batch size and TABLOCK bcp TestDatabase.dbo.Sales in C:\Temp\Sales.txt -S DINESH-PC\SQL2014,8527 -T -c -b 3000000 -h "TABLOCK"
Once the table is loaded four times, then the table was loaded again with Bulk Logged recovery model. All commands were executed just like previous one, making sure that log is truncated and shrunk before the execution.
Finally here is the result.
As you see, you can get improved performance in terms of the speed by setting the batch size and tablock with bulk-logged recovery model. If you really consider about log file growth, then tablock option with bulk-logged recovery model is the best.
I believe that BULK INSERT offers the same result, however it is yet to be tested in similar way.
Command | Recovery Model | Time (ms) | Log file size after loading |
---|---|---|---|
With no specific option | Full | 189,968 | 3,164,032 KB |
With TABLOCK | Full | 275,500 | 1,475,904 KB |
With batchsize | Full | 108,500 | 2,377,088 KB |
With batchsize and TABLOCK | Full | 99,609 | 1,964,480 KB |
With no specific option | Bulk Logged | 140,422 | 3,164,032 KB |
With TABLOCK | Bulk Logged | 239,938 | 26,816 KB |
With batchsize | Bulk Logged | 121,828 | 2,377,088 KB |
With batchsize and TABLOCK | Bulk Logged | 86,422 | 1,475,904 KB |
As you see, you can get improved performance in terms of the speed by setting the batch size and tablock with bulk-logged recovery model. If you really consider about log file growth, then tablock option with bulk-logged recovery model is the best.
I believe that BULK INSERT offers the same result, however it is yet to be tested in similar way.
No comments:
Post a Comment