In applications or complex enterprise information architectures, there is often a need to handle historical records and track changes in values for various reasons. Most of the time, it’s for security purposes; we need to monitor who is making changes and be able to correct errors, whether they are from users or database administrators. Another good reason for implementing CDC (Change Data Capture) is when we need to later integrate data from the application into a data warehouse or another application and need to identify the changed records for migration. With the Change Data Capture (CDC) feature, we can set up change tracking for a specific table in the database.
Setting up Change Data Capture (CDC) – Tracking Changes in a Table
We will configure CDC in the AdventureworksDW2016CTP3 database (a sample database from Microsoft) in a few steps. We want to track all changes in the DimCustomer table.
The first step is to enable the change data capture feature for the database:
(1.a) In the sys.databases system table, there is an indicator to check which databases have the CDC feature enabled/disabled. In this case, CDC is disabled everywhere.
(1.b) Enabling CDC for the database is done using the system procedure “sys.sp_cdc_enable_db“.
No, you see, I made my first MISTAKE. CDC does not work with the Express edition, which is an important piece of information I didn’t mention at the beginning. So, I need to switch to the Developer Edition :)))
This instance of SQL Server is the Express Edition (64-bit). Change data capture is only available in the Enterprise, Developer, Enterprise Evaluation, and Standard editions.
CDC will work with Developer Edition or higher editions, but not with Express Edition.
(1.c) After enabling CDC for the database, a new schema “cdc” and several system tables for logging changes are created.
Enabling Change Data Capture for a Table
(2.a) To track changes in the table, we enable change data capture using the system procedure named “sys.sp_cdc_enable_table”
- Two jobs are created to monitor changes.
- A new table named dbo_DimCustomer_CT is created in the system tables, which will contain the complete history of changes in that table. This table includes a complete list of columns and several additional system columns used to work with the history of the table. The most important ones are:
- __$start_lsn – Transaction identifier within which the change was made. If multiple changes occur within a transaction, the change records will all have the same __$start_lsn.
- __$operation – The operation that occurred.
- DELETE = 1
- INSERT = 2
- Value before UPDATE = 3
- Value after UPDATE = 4
Extracting Information from Change Data Capture
We make some changes in the table where CDC is applied:
- 3x UPDATE
- 1x DELETE
How does the dbo_DimCustomer_CT table with change tracking look? (I’ve selected only the columns that I changed. You can see that we have a complete history of what happened. From the log, you can see that all UPDATE operations (first 6 records) were part of one transaction, and I performed the DELETE operation separately. For UPDATE operations, we have both the original and the new values in the log.
If you want a timestamp context, you can modify the script slightly. There is a mapping table in the system tables between __$start_lsn and the timestamp context.
The CDC feature has a lot of capabilities, more than I can describe in one article. Hopefully, I’ll continue this discussion sometime in the future.