SQL Server 2008 Change Data Capture in 60 seconds

SQL Server 2008 Change Data Capture (CDC) is a management feature that allows you to track changes to the structure and content of tables in a database. Changes as a result of DML and DDL statements are registered asynchronously, so CDC is a welcome alternative to synchronous tracking features like table, database, and server triggers.

Let's start with the creation of a demo database with a table:

/********************************************/
/*                                          */
/* SQL Server 2008 Change Data Capture Demo */
/*                                          */
/********************************************/
 
/***********************************************************/
/* Preparation: Create a database and a table to play with */
/***********************************************************/
 
USE [master]
GO
 
IF EXISTS (SELECT name FROM sys.databases 
           WHERE name = N'DockOfTheBay')
DROP DATABASE [DockOfTheBay]
GO
 
CREATE DATABASE DockOfTheBay 
GO
 
USE [DockOfTheBay]
GO
 
CREATE TABLE [dbo].[TrackedTable]
(
    [ID] [int] NOT NULL,
    [Name] [varchar](100) NULL,
 
    CONSTRAINT [PK_TrackedTable] PRIMARY KEY CLUSTERED 
    (
        [ID] ASC
    )
)
GO

Before you can use it on a table, CDC needs to be enabled at database level:

/**********************************/
/* Enabling CDC at database level */
/**********************************/
 
/* Check if CDC is enabled at DB level */
SELECT is_cdc_enabled 
  FROM sys.databases 
 WHERE [name] = 'DockOfTheBay'
 
/* Activate CDC at DB level */
EXEC sys.sp_cdc_enable_db
GO

As a result of this activation, a schema with the name of cdc is generated and populated with some system tables:



When CDC is enabled for the database, you can enable it for each table that you want to track. Here's how this goes:

/*******************************/
/* Enabling CDC at table level */
/*******************************/
 
/* Activate CDC at table level */
EXEC sys.sp_cdc_enable_table 
     @source_schema = 'dbo', 
       @source_name = 'TrackedTable',
         @role_name = 'CDC_Admin'
GO
 
/* Check if CDC is enabled at table level */
SELECT is_tracked_by_cdc 
  FROM sys.tables 
 WHERE [name] = 'TrackedTable'

The system tables in the cdc schema get populated with metadata about the tracking: they will hold tables and columns that are being tracked:

/* Check tracking meta data */
SELECT *
  FROM cdc.change_tables
 
SELECT *
  FROM cdc.captured_columns
GO



Using CDC requires the SQL Server Agent to be running, since he will run two jobs that control the log readers:



The engine is running now, so let's give CDC a test drive. I'm first going to add a column to the tracked table. Shortly after that -remember, it's not synchronous- the ddl_history table in the cdc schema should reflect the change:

/***************/
/* Tracing DML */
/***************/
 
/* Modify Table Structure */
ALTER TABLE dbo.TrackedTable
ADD [Description] VARCHAR(255)
GO
 
/* Check (Please give the SQL Agent Job some slack) */
SELECT * 
  FROM cdc.ddl_history
GO



In SQL Server 2005, you can get the same result with event notifications, but that's a little bit more cumbersome (note: this is an understatement).

Let's see what happens if we apply some DML statements against the table. For each tracked table, a mirror table is created that holds the changes:

/***************/
/* Tracing DML */
/***************/
 
/* Modify Table Contents */
INSERT INTO [DockOfTheBay].[dbo].[TrackedTable]
     VALUES (1, 'X', 'X'),(2, 'Y', 'Y')
 
/* Check Table */
SELECT *
  FROM cdc.dbo_TrackedTable_CT
 
/* More DML */
UPDATE dbo.TrackedTable
   SET [Name] = 'Z'
 WHERE ID = 1
 
 DELETE dbo.TrackedTable
  WHERE ID = 2
 
 UPDATE dbo.TrackedTable
    SET [Description] = 'Ignored :-('
 GO
 
/* Check Table */
SELECT *
  FROM cdc.dbo_TrackedTable_CT


As you can see, the Description column that was added to the table, is not tracked. I'll solve that in a minute.

The __$operation column identifies the DML operation (insert, before update, after update, delete, or merge):

/* Check Table 2 */
SELECT CASE __$operation 
            WHEN 1 THEN 'Delete'
            WHEN 2 THEN 'Insert'
            WHEN 3 THEN 'Before Update'
            WHEN 4 THEN 'After Update'
            WHEN 5 THEN 'Merge' 
       END AS Operation,
       ID,
       [Name]
  FROM cdc.dbo_TrackedTable_CT




Let's walk the whole mile and reverse engineer the contents of the tracking table back to DML:

/* Regenerate DML */
SELECT CASE __$operation 
            WHEN 1 THEN 
                'DELETE FROM TrackedTable' + 
                ' WHERE id = ' + CONVERT(VARCHAR(20), id) +
                ' AND [Name] = ''' + [Name] + ''''
            WHEN 2 THEN 
                'INSERT INTO TrackedTable' +
                '(id, [Name]) VALUES (' +
                CONVERT(VARCHAR(20),id) + ', ''' +
                [Name] + '''' + ')'
            WHEN 3 THEN 
                'DELETE FROM TrackedTable'+ 
                ' WHERE id = ' + CONVERT(VARCHAR(20), id) +
                ' AND [Name] = ''' + [Name] + ''''
            WHEN 4 THEN 
                'INSERT INTO TrackedTable' +
                '(id, [Name]) VALUES (' +
                CONVERT(VARCHAR(20),ID) + ', ''' +
                [Name] + '''' + ')'
            WHEN 5 THEN 
                'MERGE ...' 
       END AS [Regenerated DML]
  FROM cdc.dbo_TrackedTable_CT
 ORDER BY __$start_lsn



Nice, isn't it? Well, unfortunately some of the changes were not registered, since CDC only tracks the columns that existed when the tracking was initiated for the table. It is unaware of my Description column, so let's bring that column in the scope:

/********************************/
/* Changing tracking properties */
/********************************/
 
/* Disable CDC at table level             */
/* WARNING: this drops the tracking table */
EXEC sys.sp_cdc_disable_table
        @source_schema = 'dbo', 
          @source_name = 'TrackedTable',
     @capture_instance = 'dbo_TrackedTable'
GO
 
 /* Table is gone ... */
SELECT *
  FROM cdc.dbo_TrackedTable_CT
 
/* Re-enable with a specific column list */
EXEC sys.sp_cdc_enable_table 
            @source_schema = 'dbo', 
              @source_name = 'TrackedTable',
                @role_name = 'CDC_Admin',
     @captured_column_list = 'ID, Name, Description' 
GO
 
/* Check columns */
SELECT *
  FROM cdc.captured_columns
 
/* Table is back */
SELECT *
  FROM cdc.dbo_TrackedTable_CT

And finally, the cleanup:

/***********/
/* Cleanup */
/***********/
 
/* Disable CDC */
EXEC sys.sp_cdc_disable_db
GO
 
/* SQL Agent Jobs are gone ... */
SELECT *
  FROM msdb.dbo.sysjobs
 WHERE [name] LIKE 'cdc.%'
 
USE [master]
GO
 
/* Drop db */
DROP DATABASE DockOfTheBay
GO

1 comment: