Friday, April 16, 2010

CheckSum Function

Returns the checksum value computed over a row of a table, or over a list of expressions. CHECKSUM is intended for use in building hash indexes.

comparing row (with large size data ex: html source) values in a table can be a difficult and resource intensive process.

Using checksum we can create Hash value for comparison instead of large data

Drawback

If one of the values in the column list changes, the checksum of the list also generally changes. However, there is a chance that the checksum will not change (i.e. collisions ).

So we can use Checksum function when our application can tolerate occasionally missing a change.


Testing


I have tested for Html_source column with 6.5 million data , only less then 100 hash values are wrong but the performance is extraordinary ,

when we need a immediate comparison solution for columns with large data CheckSum function is the best choice

Example

--******* CheckSum Function ***********

-- Generate hash value

ALTER TABLE Compress_Data1
ADD hashvalue AS CHECKSUM(source) PERSISTED

-- Find duplicate using hash value instead of Source

SELECT hashvalue , COUNT(hashvalue) AS Duplicate
FROM Compress_Data1
GROUP BY hashvalue
HAVING COUNT(hashvalue) > 1

No comments:

Post a Comment