Calculate checksum for entire rows in SQL Server

Brent Ozar’s latest blog post contained a question that piqued my interest about a potential method to calculate the checksum for an entire row in SQL Server.

The original question:

Kansas4444 asks: Hi Brent, do you have any advice on calculating checksum for an entire row on SQL Server ? (Brent says: I’ve never had to do this, so no idea.)
https://www.brentozar.com/archive/2023/09/office-hours-questions-i-dont-have-answers-for/

Although I did leave the following comment, I still wanted to explore this more and make sure it’s actually valid and doesn’t rely on stuff like case-sensitive collations.

Vlad Drumea September 19, 2023 1:52 pm 1. The only method I can think of (but I’d like to dig more into this) is something like concatenating that record into a string variable and then applying HASHBYTES on that variable Example using the Users table in the StackOverflow database (yes, I didn’t concatenate all the columns, but it should be enough for a PoC: DECLARE @Record NVARCHAR(MAX); SELECT @Record = CAST(ID AS NVARCHAR(10)) + ISNULL(AboutMe, ”) + CAST(ISNULL(Age, 0) AS NVARCHAR(10)) + CONVERT(NVARCHAR(23), CreationDate, 21) + DisplayName + CAST(ISNULL(DownVotes, 0) AS NVARCHAR(10)) + ISNULL(EmailHash, ”) + CONVERT(NVARCHAR(23), LastAccessDate, 21) FROM Users WHERE id = 26837 SELECT HASHBYTES(‘SHA2_256’, @Record); SEO checksum for row in sql server

Use cases

One of the use cases I can think of would be a means of change control, making sure that the latest version of a specific record is a result of a valid change done via the application/website and not by someone who ran an accidental update via SSMS on that table by mixing up Prod with Dev/QA.

This implies that, whenever a “legitimate” insert/update is made, the application has a way to:

compute the checksum of the newly added/changed record
store that checksum for that record

Improving on the code suggested in my comment

Brief overview

Note: I’m running this on SQL Server 2019, latest CU, but the code is compatible with SQL Server 2022 as well as with previous versions.

I’m using the 180GB version of the StackOverflow database provided by Brent Ozar here and both the instance and the database are using the SQL_Latin1_General_CP1_CI_AS collation.

For reference, this is the structure of the Users table:

CREATE TABLE [dbo].[Users](

[Id] [int] IDENTITY(1,1) NOT NULL,

[AboutMe] [nvarchar](max) NULL,

[Age] [int] NULL,

[CreationDate] [datetime] NOT NULL,

[DisplayName] [nvarchar](40) NOT NULL,

[DownVotes] [int] NOT NULL,

[EmailHash] [nvarchar](40) NULL,

[LastAccessDate] [datetime] NOT NULL,

[Location] [nvarchar](100) NULL,

[Reputation] [int] NOT NULL,

[UpVotes] [int] NOT NULL,

[Views] [int] NOT NULL,

[WebsiteUrl] [nvarchar](200) NULL,

[AccountId] [int] NULL,

CONSTRAINT [PK_Users_Id] PRIMARY KEY CLUSTERED

(

[Id] ASC

));

Moving on

After my initial comment I’ve realized that I have not taken into account a few situations:

Without a separator, there is no difference between an empty string and two consecutive empty strings
Without having a specific placeholder for a NULL and just using ISNULL(Column, N'') there’s no difference between an empty string added through an update and a NULL that’s been turned into an empty string through ISNULL(), or between an Age set to 0 via the Website or from using 0 as a placeholder for NULL when computing the checksum.

Let’s take Brent’s user for example:

1	SELECT * FROM Users WHERE id = 26837;

If Brent would update the Age field to 0 via the StackOverflow Website, the checksum after that change would be the same as the one previously generated by the T-SQL in my comment.

First, I’ll need to add a column to store the checksum.

1	ALTER TABLE Users ADD [CheckSum] VARBINARY(32);

Since the SHA2_256 output is only 32 bytes, I can limit the width of the new column that byte size.

Next, I create a view that returns the Id and CheckSum for the Users table.

For my PoC, I’m opting for ; as the separator and -NULL- as the placeholder for NULLs.

I’m specifying WITH SCHEMABINDING when creating the view in order to avoid situations where someone changes the table structure without also updating the view.

CREATE VIEW V_GetUsersRecordChecksum

WITH SCHEMABINDING

SELECT Id, HASHBYTES(N'SHA2_256',CAST(ID AS NVARCHAR(10))

+ N';' + ISNULL(AboutMe, N'-NULL-')

+ N';' + ISNULL(CAST(Age AS NVARCHAR(10)), N'-NULL-')

+ N';' + CONVERT(NVARCHAR(23), CreationDate, 21)

+ N';' + DisplayName

+ N';' + CAST(DownVotes AS NVARCHAR(10))

+ N';' + ISNULL(EmailHash, N'-NULL-')

+ N';' + CONVERT(NVARCHAR(23), LastAccessDate, 21)

+ N';' + ISNULL([Location], N'-NULL-')

+ N';' + CAST(Reputation AS NVARCHAR(10))

+ N';' + CAST(UpVotes AS NVARCHAR(10))

+ N';' + CAST([Views] AS NVARCHAR(10))

+ N';' + ISNULL(WebsiteUrl,N'-NULL-')

+ N';' + ISNULL(CAST(AccountId AS NVARCHAR(10)), N'-NULL-')) AS [CheckSum]

FROM dbo.Users;

Note that I am using HASHBYTES, because BINARY_CHECKSUM sometimes produces the same checksum for different data.

Plus, I like playing with HASHBYTES, especially in combination with SQL logins’ password hashes.

I then test the view works as expected.

1	SELECT * FROM V_GetUsersRecordChecksum WHERE Id = 26837;

And proceed to update the CheckSum column for Brent’s record.

UPDATE u

SET u.[CheckSum] = c.[CheckSum]

FROM Users AS U

INNER JOIN V_GetUsersRecordChecksum AS c

ON c.Id = u.Id

WHERE u.Id = 26837;

Note that if I want to update the new column for the whole table, I can just omit the WHERE clause.

Afterwards, I compare the values to make sure they are consistent between the view and the stored checskum.

1 2	SELECT Id, [CheckSum] FROM Users WHERE Id = 26837; SELECT * FROM V_GetUsersRecordChecksum WHERE Id = 26837;

Testing

Now to do some small changes to test the behavior.

First, I change the case of the letter B in DisplayName.

1 2	UPDATE Users SET DisplayName = REPLACE(DisplayName, N'B', N'b') WHERE id = 26837;

I then compare the previous checksum with the new one, then undo the change and verify again.

/*Verify checksum change*/

SELECT Id, [CheckSum], DisplayName FROM Users WHERE Id = 26837;

SELECT * FROM V_GetUsersRecordChecksum WHERE Id = 26837;

/*Undo the change*/

UPDATE Users SET DisplayName = REPLACE(DisplayName, N'b', N'B')

WHERE id = 26837;

/*Check again*/

SELECT Id, [CheckSum], DisplayName FROM Users WHERE Id = 26837;

SELECT * FROM V_GetUsersRecordChecksum WHERE Id = 26837;

Note that the checksums no longer match in the first pair of the result sets since the value of the CheckSum column in the Users table is the old one.

This also confirms that it’s working as intended regardless if the instance is case-sensitive or not.

What does this mean for the application?

For an update to a record to have a consistent checksum, the application should make sure to keep the CheckSum column updated.

So an update for the LastAccessDate column would look like this:

DECLARE @Id INT;

SET @Id = 26837

/*Update record*/

UPDATE Users SET LastAccessDate = GETDATE()

WHERE Id = @Id;

/*Sync CheckSum*/

UPDATE u

SET u.[CheckSum] = c.[CheckSum]

FROM Users AS U

INNER JOIN V_GetUsersRecordChecksum AS c

ON c.Id = u.Id

WHERE u.Id = @Id;

If the application already handles inserts and updates via a stored procedure, then the Sync CheckSum part can be included in the stored procedure in order to avoid making code changes in the application itself.

Why not use a trigger or a computed column?

Because they defeat the purpose of trying to ensure that the changes to the data come from the application alone.

A trigger will be fired for every update, unless you start making flag tables in tempdb so that it only fires for sessions coming from the application (more on that in a future post maybe).

Of course, someone who knows the inner workings of the application can just update the CheckSum column themselves, but it’s still an extra step that someone needs to be aware of.

Conclusion

It is possible to get an accurate checksum for a row in SQL Server, it just requires a bit of planning and also adding another step to insert and update operations that touch the table you’re interested in

SQL Server T-SQL

Cookie	Duration	Description
cookielawinfo-checbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Calculate checksum for entire rows in SQL Server

Use cases

Improving on the code suggested in my comment

Brief overview

Moving on

Testing

What does this mean for the application?

Why not use a trigger or a computed column?

Conclusion

Script to get database sizes in SQL Server

Script to get statistics info in SQL Server

You may also like

1 comment

Leave a Comment Cancel Reply