Sunday, June 28, 2015

Indexes should be dropped before a bulk insert or not

If we are loading a large volume of data into an indexed table, we always drop the index, load the dataset, and re-create indexes assuming that the overhead of dropping and re-creating indexes is less than the overhead of loading data with the indexes in place. But for certain situations, this assumption may not give any benefits. When loading a small dataset, dropping and re-creating may be counterproductive, and may take more time for re-creating than the time it takes for loading data with indexes in place.

Considering that, how do we determine whether we should drop the indexes before loading or insert all records with indexes in place?

Microsoft has given set of guidelines for this. It is based on minimal amount of new data to be loaded, proportion to the total data in the table. If your new data load is equal or greater than the percentage given, it is recommended to drop the indexes before loading and re-create them after loading.

Indexes Relative amount of new data
Clustered index only 30%
Clustered and one nonclustered index 25%
Clustered and two nonclustered indexes 25%
Single nonclustered index only 100%
Two nonclustered indexes 60%

For example, if you have 100,000 records in the table that has a clustered index only and have to load 30,000 new records, it is better to drop the index before loading and re-create it afterward.

You can read more info on this at: https://msdn.microsoft.com/en-us/library/ms177445.aspx

No comments: