SQL Server 2008 Sparse Columns in 60 seconds

SQL Server's most recent releases contain features that allow us to improve the response time of queries against very large tables (large in the sense of 'many rows'):
  • Partitioning (introduced in SQL Server 2005) allows us to spread tables and/or indexes over multiple filegroups, and
  • Filtered Indexes (introduced in SQL Server 2008) allow us to place indexes on parts of a table.
On top of that, SQL Server 2008 introduces a feature that efficiently deals with large tables, this time in the sense of 'many columns': the so-called Sparse Columns.

A lot of data models in the real world have tables that have many many columns containing just a NULL value. Unfortunately a regular NULL value consumes some physical space, so large quantities of those can cause performance decrease. SQL Server 2008's sparse columns are columns with optimized storage of NULL values: NULLs just take no space at all. Of course there's a down side: a small overhead (2 or 4 extra bytes) for non-NULL values. So a sparse column is meant to be used when a column has lots of NULL values (sparsely filled).

Here's a small example. I create two tables, a regular one and one containing sparse columns, I populate them, and then compare sizes:

/***************/
/* Preparation */
/***************/
USE AdventureWorks2008
GO
CREATE SCHEMA DockOfTheBay
GO
 
/**************************************/
/* Step One: Show Storage Improvement */
/**************************************/
 
-- Create two tables, one with 23 columns, and one with 23 columns but 20 sparse.
CREATE TABLE DockOfTheBay.TableWithoutSparseColumns (
   ProductKey INT IDENTITY, ProductName VARCHAR (100), CategoryID INT,
   col01 INT NULL, col02 INT NULL, col03 INT NULL, 
   col04 INT NULL, col05 INT NULL, col06 INT NULL,
   col07 INT NULL, col08 INT NULL, col09 INT NULL, 
   col10 INT NULL, col11 INT NULL, col12 INT NULL,
   col13 INT NULL, col14 INT NULL, col15 INT NULL, 
   col16 INT NULL, col17 INT NULL, col18 INT NULL,
   c1019 INT NULL, col20 INT NULL);
GO
 
CREATE TABLE DockOfTheBay.TableWithSparseColumns (
   ProductKey INT IDENTITY, ProductName VARCHAR (100), CategoryID INT,
   col01 INT SPARSE NULL, col02 INT SPARSE NULL, col03 INT SPARSE NULL, 
   col04 INT SPARSE NULL, col05 INT SPARSE NULL, col06 INT SPARSE NULL,
   col07 INT SPARSE NULL, col08 INT SPARSE NULL, col09 INT SPARSE NULL, 
   col10 INT SPARSE NULL, col11 INT SPARSE NULL, col12 INT SPARSE NULL,
   col13 INT SPARSE NULL, col14 INT SPARSE NULL, col15 INT SPARSE NULL, 
   col16 INT SPARSE NULL, col17 INT SPARSE NULL, col18 INT SPARSE NULL,
   c1019 INT SPARSE NULL, col20 INT SPARSE NULL);
GO
 
-- Populate the tables
DECLARE @Counter int
SET @Counter = 0
 
WHILE @Counter < 10000
BEGIN
    INSERT INTO DockOfTheBay.TableWithSparseColumns 
                (ProductName, CategoryID) VALUES ('aaaa', 1);
    INSERT INTO DockOfTheBay.TableWithSparseColumns 
                (ProductName, CategoryID, col01) VALUES ('bbbb', 2, 46);
    INSERT INTO DockOfTheBay.TableWithSparseColumns 
                (ProductName, CategoryID, col02) VALUES ('cccc', 3, 44);
    INSERT INTO DockOfTheBay.TableWithSparseColumns 
                (ProductName, CategoryID, col01, col02) VALUES ('dddd', 4, 12, 34);
    INSERT INTO DockOfTheBay.TableWithSparseColumns 
                (ProductName, CategoryID, col12, col13, col14, col15) VALUES ('eeee', 4, 12, 34, 46, 66);
 
    INSERT INTO DockOfTheBay.TableWithoutSparseColumns 
                (ProductName, CategoryID) VALUES ('aaaa', 1);
    INSERT INTO DockOfTheBay.TableWithoutSparseColumns 
                (ProductName, CategoryID, col01) VALUES ('bbbb', 2, 46);
    INSERT INTO DockOfTheBay.TableWithoutSparseColumns 
                (ProductName, CategoryID, col02) VALUES ('cccc', 3, 44);
    INSERT INTO DockOfTheBay.TableWithoutSparseColumns 
                (ProductName, CategoryID, col01, col02) VALUES ('dddd', 4, 12, 34);
    INSERT INTO DockOfTheBay.TableWithoutSparseColumns 
                (ProductName, CategoryID, col12, col13, col14, col15) VALUES ('eeee', 4, 12, 34, 46, 66);
 
    SET @Counter = @Counter + 1
END
GO
 
-- Check average row size, and number of pages
SELECT [avg_record_size_in_bytes], [page_count] FROM sys.dm_db_index_physical_stats (
   DB_ID ('AdventureWorks2008'), 
   OBJECT_ID ('DockOfTheBay.TableWithoutSparseColumns'), 
   NULL, NULL, 
   'DETAILED');
 
SELECT [avg_record_size_in_bytes], [page_count] FROM sys.dm_db_index_physical_stats (
   DB_ID ('AdventureWorks2008'), 
   OBJECT_ID ('DockOfTheBay.TableWithSparseColumns'), 
   NULL, NULL, 
   'DETAILED');
GO


Observe that the classic table consumes twice the space of the one with sparse columns:



Sparse columns have also another advantage. Internally all sparse columns are grouped into one column. That implies that we can pass the treshold of 1024 columns per table: a table can have up to 100.000 sparse columns! You can make this internal column accessible via a Column Set. But be careful, this has an effect on the SELECT * behavior:

/************************************/
/* Step Two: Demonstrate Column Set */
/************************************/
 
-- Create Table with Column Set
CREATE TABLE DockOfTheBay.TableWithSparseColumnsAndColumnSet (
   ProductKey INT IDENTITY, ProductName VARCHAR (100), CategoryID INT,
   col01 INT SPARSE NULL, col02 INT SPARSE NULL, col03 INT SPARSE NULL, 
   col04 INT SPARSE NULL, col05 INT SPARSE NULL, col06 INT SPARSE NULL,
   col07 INT SPARSE NULL, col08 INT SPARSE NULL, col09 INT SPARSE NULL, 
   col10 INT SPARSE NULL, col11 INT SPARSE NULL, col12 INT SPARSE NULL,
   col13 INT SPARSE NULL, col14 INT SPARSE NULL, col15 INT SPARSE NULL, 
   col16 INT SPARSE NULL, col17 INT SPARSE NULL, col18 INT SPARSE NULL,
   c1019 INT SPARSE NULL, col20 INT SPARSE NULL,
   SparseColumns XML COLUMN_SET FOR ALL_SPARSE_COLUMNS);
GO
 
-- Populate the Table
DECLARE @Counter int
SET @Counter = 0
 
WHILE @Counter < 10000
BEGIN
    INSERT INTO DockOfTheBay.TableWithSparseColumnsAndColumnSet 
                (ProductName, CategoryID) VALUES ('aaaa', 1);
    INSERT INTO DockOfTheBay.TableWithSparseColumnsAndColumnSet 
                (ProductName, CategoryID, col01) VALUES ('bbbb', 2, 46);
    INSERT INTO DockOfTheBay.TableWithSparseColumnsAndColumnSet 
                (ProductName, CategoryID, col02) VALUES ('cccc', 3, 44);
    INSERT INTO DockOfTheBay.TableWithSparseColumnsAndColumnSet 
                (ProductName, CategoryID, col01, col02) VALUES ('dddd', 4, 12, 34);
    INSERT INTO DockOfTheBay.TableWithSparseColumnsAndColumnSet 
                (ProductName, CategoryID, col12, col13, col14, col15) VALUES ('eeee', 4, 12, 34, 46, 66);
 
    SET @Counter = @Counter + 1
END
 
-- Compare SELECT *
SELECT * FROM DockOfTheBay.TableWithoutSparseColumns WHERE ProductKey < 11
SELECT * FROM DockOfTheBay.TableWithSparseColumnsAndColumnSet WHERE ProductKey < 11
GO
 
/***********/
/* Cleanup */
/***********/
DROP TABLE DockOfTheBay.TableWithoutSparseColumns
DROP TABLE DockOfTheBay.TableWithSparseColumns
DROP TABLE DockOfTheBay.TableWithSparseColumnsAndColumnSet
DROP SCHEMA DockOfTheBay

No comments:

Post a Comment