A Checksum Algorithm

Checksums

People who care deeply about this have developed a number of much more reliable algorithms. For example, the Cyclic Redundancy Check algorithms, CRC-8, CRC-16, and CRC-32, do fairly complex things to make the checksum sensitive to such problems. For example, using CRC, swapping two bytes in the message will generate a different checksum because the value computed depends not only on the character value, but also on the position in the message in which the byte occurred.

Disk drives often use techniques derived from Hamming Codes (named after Richard Hamming, an AT&T/Bell Laboratories researcher who is probably best known for the techniques he developed for correcting single-bit parity errors in memory, although that is but one of the many applications of his wide-ranging work in mathematical characterizations of data in computer systems). There are some detailed tutorials out there for those of you who wish to pursue this more deeply. Possibly the best-known of these is the set of codes called Fire Codes, named after the inventor, whose surname is Fire (I can't find any citations to him handily, so I can't give you more information than this). These can do things like reconstruct a sequence of bytes (sometimes as many as 20 bytes, on typical disks) that are lost due to burst noise, typically a bad spot on the disk. These are very powerful data recovery codes.

Checksums have many other applications. For example, one feature I find very annoying in many programs is the notion of "change". If I change a value in a dialog, I often get a notification that I have "changed" something, and a save/update/etc. is required. But mostly these are done by detecting if the user (that is, I) have typed anything at all into a control, changed a selection on a ComboBox, etc. A simple Boolean value, maintained by responding to OnChange, OnSelendOK, and similar messages. Of course, if I haven't actually  changed the information, or worse still, if I change it back, I get the same warning. I consider such systems primitive beyond recovery, and build systems which are far more user-friendly. 

I do this by keeping the information available in some form, most commonly a class. I then compute a checksum on the values when I come in (in OnInitDialog), and recompute it each time there is a change. If the new checksum is the same as the old checksum, I assume that there are no changes. I can then indicate in various ways what is required. In other cases, I'll do this in the CDocument-derived class; whenever a change is effected by the GUI, I recompute the checksum and set the Modified flag according to how the checksum compares to the checksum computed when the document was first created/loaded/whatever, rather than assume any change whatsoever is implicitly a change in content. Thus if the user hits a key in a CEditView then hits the backspace key, I'll end up indicating "no change".

Like most checksum techniques, this decreases in reliability as the number of bytes checksummed increases. This is because the more information you try to pack into a 32-bit value, using an information-losing transformation, the more likely the case where two completely different sequences of values will produce the same 32-bit value. This is one consideration as to why network packets are not sent as megabyte packets; errors in a megabyte transmission might result in the same checksum as the error-free transmission, while for short packet sizes (e.g., 4K, or 1500 bytes) the chances are so low as to not be of any concern in practical networking.

Therefore, my techniques generally are useful when a few thousand bytes of state are involved, such as in a dialog.

I use a technique that has no particular theoretical justification. But I've found it to be reliable for my purposes. The story is that I wanted to use CRC-32 some years ago, but couldn't locate the source code for a CRC-32 algorithm on the Web at that time, so I turned to my Adobe Type 1 Font Handbook and cribbed their encryption algorithm. But rather than encrypt the data, I just used the basic algorithm to create a 32-bit checksum. You can replace my basic algorithm with a 32-bit CRC if you want to. Here's my code, and some commentary on how you might use it.

You might also like...

Comments

Contribute

Why not write for us? Or you could submit an event or a user group in your area. Alternatively just tell us what you think!

Our tools

We've got automatic conversion tools to convert C# to VB.NET, VB.NET to C#. Also you can compress javascript and compress css and generate sql connection strings.

“Most software today is very much like an Egyptian pyramid with millions of bricks piled on top of each other, with no structural integrity, but just done by brute force and thousands of slaves” - Alan Kay