Cyclic redundancy check

Overview
The Basic Idea Behind CRC Algorithms
Binary Arithmetic with No Carries
Choosing A Poly
A Straightforward CRC Implementation
A Table-Driven Implementation
A Slightly Mangled Table-Driven Implementation
"Reflected" Table-Driven Implementations
Initial and Final Values
Example
CRC Algorithms
- A Parameterized Model For CRC Algorithms
More

Overview

Cyclic Redundancy Check循环冗余检验，是基于数据计算一组效验码，用于核对数据传输过程中是否被更改或传输错误。

The Basic Idea Behind CRC Algorithms

The basic idea of CRC algorithms is simply to treat the message as an enormous binary number, to divide it by another fixed binary number, and to make the remainder from this division the checksum. Upon receipt of the message, the receiver can perform the same division and compare the remainder with the "checksum" (transmitted remainder).

Binary Arithmetic with No Carries

Adding two numbers in CRC arithmetic is the same as adding numbers in ordinary binary arithmetic except there is no carry. This means that each pair of corresponding bits determine the corresponding output bit without reference to any other bit positions. For example:

 10011011
+11001010
 --------
 01010001
 --------

There are only four cases for each bit position:

   0+0=0
   0+1=1
   1+0=1
   1+1=0  (no carry)
# Subtraction is identical:
        10011011
       -11001010
        --------
        01010001
        --------

In fact, both addition and subtraction in CRC arithmetic is equivalent to the XOR operation, and the XOR operation is its own inverse.

Here's a fully worked division:

            1100001010
       _______________
10011 ) 11010110110000
        10011,,.,,....
        -----,,.,,....
         10011,.,,....
         10011,.,,....
         -----,.,,....
          00001.,,....
          00000.,,....
          -----.,,....
           00010,,....
           00000,,....
           -----,,....
            00101,....
            00000,....
            -----,....
             01011....
             00000....
             -----....
              10110...
              10011...
              -----...
               01010..
               00000..
               -----..
                10100.
                10011.
                -----.
                 01110
                 00000
                 -----
                  1110 = Remainder

Thus we see that CRC arithmetic is primarily about XORing particular values at various shifting offsets.

Choosing A Poly

Choosing a poly is somewhat of a black art and the reader is referred to¹ (p.130-132) which has a very clear discussion of this issue.

Some popular polys are:

name	Polys	Hex
CRC12	x¹² + x¹¹ + x³ + x² + x + 1	0x80F
CRC16	x¹⁶ + x¹⁵ + x² + 1	0x8005
CRC16-CCITT	x¹⁶ + x¹² + x⁵ + 1	0x1021
CRC32	x³² + x²⁶ + x²³ + x²² + x¹⁶ + x¹² +	0x04C11DB7
	x¹¹+ x¹⁰ + x⁸ + x⁷ + x⁵ + x⁴ + x² + x + 1

A Straightforward CRC Implementation

           3   2   1   0   Bits
         +---+---+---+---+
Pop! <-- |   |   |   |   | <----- Augmented message
         +---+---+---+---+
      1    0   1   1   1   = The Poly

To perform the division perform the following:

Load the register with zero bits.
Augment the message by appending W zero bits to the end of it.
While (more message bits)
   Begin
   Shift the register left by one bit, reading the next bit of the
      augmented message into register bit position 0.
   If (a 1 bit popped out of the register during step 3)
      Register = Register XOR Poly.
   End
The register now contains the remainder.

A Table-Driven Implementation

The straightforward method operates at the bit level, it is rather awkward to code (even in C), and inefficient to execute (it has to loop once for each bit). To speed it up, we need to find a way to enable the algorithm to process the message in units larger than one bit.

For the purposes of discussion, let us switch from a 4-bit poly to a 32-bit one. Our register looks much the same, except the boxes represent bytes instead of bits, and the Poly is 33 bits (one implicit 1 bit at the top and 32 "active" bits) (W=32).

            3    2    1    0   Bytes
         +----+----+----+----+
Pop! <-- |    |    |    |    | <----- Augmented message
         +----+----+----+----+
        1<------32 bits------>

Consider for a moment that we use the top 8 bits of the register to calculate the value of the top bit of the register during the next 8 iterations. Suppose that we drive the next 8 iterations using the calculated values (which we could perhaps store in a single byte register and shift out to pick off each bit). Then we note three things:

The top byte of the register now doesn't matter. No matter how many times and at what offset the poly is XORed to the top 8 bits, they will all be shifted out the right hand side during the next 8 iterations anyway.
The remaining bits will be shifted left one position and the rightmost byte of the register will be shifted in the next byte
While all this is going on, the register will be subjected to a series of XOR's in accordance with the bits of the pre-calculated control byte.

Perhaps you can see the solution now. Putting all the pieces together we have an algorithm that goes like this:

While (augmented message is not exhausted)
   Begin
   Examine the top byte of the register
   Calculate the control byte from the top byte of the register
   Sum all the Polys at various offsets that are to be XORed into
      the register in accordance with the control byte
   Shift the register left by one byte, reading a new message byte
      into the rightmost byte of the register
   XOR the summed polys to the register
   End

As it stands this is not much better than the SIMPLE algorithm. However, it turns out that most of the calculation can be precomputed and assembled into a table. As a result, the above algorithm can be reduced to:

While (augmented message is not exhaused)
      Begin
      Top = top_byte(Register);
      Register = (Register << 24) | next_augmessage_byte;
      Register = Register XOR precomputed_table[Top];
      End

The above is a very efficient algorithm requiring just a shift, and OR, an XOR, and a table lookup per byte.

In C, the algorithm main loop looks like this:

r=0;
while (len--)
  {
   byte t = (r >> 24) & 0xFF;
   r = (r << 8) | *p++;
   r^=table[t];
  }

where len is the length of the augmented message in bytes, p points to the augmented message, r is the register, t is a temporary, and table is the computed table. This code can be made even more unreadable as follows:

r=0;
   while (len--)
          r = ((r << 8) | *p++) ^ t[(r >> 24) & 0xFF];

A Slightly Mangled Table-Driven Implementation

Despite the terse beauty of the above lines, those optimizing hackers couldn't leave it alone. The trouble, you see, is that this loop operates upon the AUGMENTED message and in order to use this code, you have to append W/8 zero bytes to the end of the message before pointing p at it. Depending on the run-time environment, this may or may not be a problem; if the block of data was handed to us by some other code, it could be a BIG problem. One alternative is simply to append the following line after the above loop, once for each zero byte:

最后还需要传入W/4次的0

for (i=0; i<W/4; i++)
           r = (r << 8) ^ t[(r >> 24) & 0xFF];

However, at the further expense of clarity (which, you must admit, is already a pretty scare commodity in this code) we can reorganize this small loop further so as to avoid the need to either augment the message with zero bytes, or to explicitly process zero bytes at the end as above.

          3    2    1    0   Bytes
       +----+----+----+----+
+-----<|    |    |    |    | <----- Augmented message
|      +----+----+----+----+
|                ^
|                |
|               XOR
|                |
|     0+----+----+----+----+
v      +----+----+----+----+
|      +----+----+----+----+
|      +----+----+----+----+
|      +----+----+----+----+
|      +----+----+----+----+
|      +----+----+----+----+
+----->+----+----+----+----+
       +----+----+----+----+
       +----+----+----+----+
       +----+----+----+----+
       +----+----+----+----+
    255+----+----+----+----+

Algorithm

Shift the register left by one byte, reading in a new message byte.
Use the top byte just rotated out of the register to index the table of 256 32-bit values.
XOR the table value into the register.
Goto 1 iff more augmented message bytes.

Now, note the following facts:

TAIL 处理最后补入的0 The W/4 augmented zero bytes that appear at the end of the message will be pushed into the register from the right as all the other bytes are, but their values (0) will have no effect whatsoever on the register because 1) XORing with zero does not change the target byte, and 2) the four bytes are never propagated out the left side of the register where their zeroness might have some sort of influence. Thus, the sole function of the W/4 augmented zero bytes is to drive the calculation for another W/4 byte cycles so that the end of the REAL data passes all the way through the register.
HEAD 若register初始是0,开始的4次循环仅仅是把0移出 If the initial value of the register is zero, the first four iterations of the loop will have the sole effect of shifting in the first four bytes of the message from the right. This is because the first 32 control bits are all zero and so nothing is XORed into the register. Even if the initial value is not zero, the first 4 byte iterations of the algorithm will have the sole effect of shifting the first 4 bytes of the message into the register and then XORing them with some constant value (that is a function of the initial value of the register).

These facts, combined with the XOR property

(A xor B) xor C = A xor (B xor C)

mean that message bytes need not actually travel through the W/4 bytes of the register. Instead, they can be XORed into the top byte just before it is used to index the lookup table. This leads to the following modified version of the algorithm.

 +-----<Message (non augmented)
 |
 v         3    2    1    0   Bytes
 |      +----+----+----+----+
XOR----<|    |    |    |    |
 |      +----+----+----+----+
 |                ^
 |                |
 |               XOR
 |                |
 |     0+----+----+----+----+
 v      +----+----+----+----+
 |      +----+----+----+----+
 |      +----+----+----+----+
 |      +----+----+----+----+
 |      +----+----+----+----+
 |      +----+----+----+----+
 +----->+----+----+----+----+
        +----+----+----+----+
        +----+----+----+----+
        +----+----+----+----+
        +----+----+----+----+
     255+----+----+----+----+

Algorithm

Shift the register left by one byte, reading in a new message byte.
XOR the top byte just rotated out of the register with the next message byte to yield an index into the table ([0,255]).
XOR the table value into the register.
Goto 1 iff more augmented message bytes.

This is an IDENTICAL algorithm and will yield IDENTICAL results. The C code looks something like this:

r=0;
while (len--)
       r = (r<<8) ^ t[(r >> 24) ^ *++];

"Reflected" Table-Driven Implementations

DEFINITION: A value/register is reflected if it's bits are swapped around its centre. For example: 0101 is the 4-bit reflection of 1010.

Turns out that UARTs (those handy little chips that perform serial IO) are in the habit of transmitting each byte with the least significant bit (bit 0) first and the most significant bit (bit 7) last (i.e. reflected).

The bytes are processed in the same order, but the bits in each byte are swapped; bit 0 is now bit 7, bit 1 is now bit 6, and so on.

不是对信息bytes的镜像,而是改变算法 In this situation, a normal sane software engineer would simply reflect each byte before processing it. However, it would seem that normal sane software engineers were thin on the ground when this early ground was being broken, because instead of reflecting the bytes, whoever was responsible held down the byte and reflected the world, leading to the following "reflected" algorithm which is identical to the previous one except that everything is reflected except the input bytes.

  Message (non augmented) >-----+
                                |
Bytes   0    1    2    3        v
     +----+----+----+----+      |
     |    |    |    |    |>----XOR
     +----+----+----+----+      |
               ^                |
               |                |
              XOR               |
               |                |
     +----+----+----+----+0     |
     +----+----+----+----+      v
     +----+----+----+----+      |
     +----+----+----+----+      |
     +----+----+----+----+      |
     +----+----+----+----+      |
     +----+----+----+----+      |
     +----+----+----+----+<-----+
     +----+----+----+----+
     +----+----+----+----+
     +----+----+----+----+
     +----+----+----+----+
     +----+----+----+----+255

Notes:

The table is identical to the one in the previous algorithm except that each entry has been reflected.
The initial value of the register is the same as in the previous algorithm except that it has been reflected.
The bytes of the message are processed in the same order as before (i.e. the message itself is not reflected).
The message bytes themselves don't need to be explicitly reflected, because everything else has been!

Initial and Final Values

In addition to the complexity already seen, CRC algorithms differ from each other in two other regards:

The initial value of the register.
The value to be XORed with the final register value.

For example, the "CRC32" algorithm initializes its register to FFFFFFFF and XORs the final register value with FFFFFFFF.

Example

#include <stdio.h>
#include <stdint.h>
#include <string.h>


// typedef unsigned long  crc;
typedef uint32_t crc;

#define CRC_NAME                        "CRC-32"
#define POLYNOMIAL                      0x04C11DB7
#define INITIAL_REMAINDER       0xFFFFFFFF
#define FINAL_XOR_VALUE         0xFFFFFFFF
#define REFLECT_DATA            TRUE
#define REFLECT_REMAINDER       TRUE
#define CHECK_VALUE                     0xCBF43926


#define WIDTH    (8 * sizeof(crc))
#define TOPBIT   (1 << (WIDTH - 1))

#if (REFLECT_DATA == TRUE)
#undef  REFLECT_DATA
#define REFLECT_DATA(X)                 ((unsigned char) reflect((X), 8))
#else
#undef  REFLECT_DATA
#define REFLECT_DATA(X)                 (X)
#endif

#if (REFLECT_REMAINDER == TRUE)
#undef  REFLECT_REMAINDER
#define REFLECT_REMAINDER(X)    ((crc) reflect((X), WIDTH))
#else
#undef  REFLECT_REMAINDER
#define REFLECT_REMAINDER(X)    (X)
#endif

unsigned long reflect(unsigned long data, unsigned char n_bits) {
  unsigned long  reflection = 0x00000000;
  unsigned char  bit;
  for (bit = 0; bit < n_bits; ++bit) {
    if (data & 0x1) {
      reflection |= (1 <<((n_bits - 1) - bit));
    }
    data >>= 1;
  }
  return (reflection);
}

crc  crc_table[256];

void CrcInit() {
  crc remainder;
  int dividend;
  unsigned char bit;
  // Compute the remainder of each possible dividend.
  for (dividend = 0; dividend < 256; ++dividend) {
    remainder = dividend << (WIDTH - 8);
    for (bit = 8; bit > 0; --bit) {
      if (remainder & TOPBIT) {
        remainder = (remainder << 1) ^ POLYNOMIAL;
      } else {
        remainder <<= 1;
      }
    }
    crc_table[dividend] = remainder;
  }
}

crc CrcFast(unsigned char const message[], int n_bytes) {
  crc remainder = INITIAL_REMAINDER;
  unsigned char data;
  int byte;
  for (byte = 0; byte < n_bytes; ++byte) {
    data = REFLECT_DATA(message[byte]) ^ (remainder >> (WIDTH - 8));
    remainder = crc_table[data] ^ (remainder << 8);
  }
  return (REFLECT_REMAINDER(remainder) ^ FINAL_XOR_VALUE);
}

int main() {
  printf("wid=%ld, top=0x%x\n", WIDTH, TOPBIT);
  unsigned char  test[] = "123456789";
  CrcInit();
  printf("The crcFast() of \"123456789\" is 0x%X\n", CrcFast(test, strlen(test)));
  return 0;
}

CRC Algorithms

A "CRC16" (CRC-16-CCITT) implementation on AutomationWiki.
Implementing The CCITT Cyclical Redundancy Check on Dr Dobbs.
Fast CRC32 Compare
Best CRC Polynomials
A C++ Class that encapsulates the official CRC32 algorithm
CRC32 C or C++ implementation on the stackoverflow
A CRC algorithm in C: crc.zip

A Parameterized Model For CRC Algorithms

The algorithm is from A Parameterized Model For CRC Algorithms.

REFIN This is a boolean parameter. If it is FALSE, input bytes are processed with bit 7 being treated as the most significant bit (MSB) and bit 0 being treated as the least significant bit. If this parameter is FALSE, each byte is reflected before being processed.
REFOUT This is a boolean parameter. If it is set to FALSE, the final value in the register is fed into the XOROUT stage directly, otherwise, if this parameter is TRUE, the final register value is reflected first.

The crc algorithm and genarating a lookup table are in the crcmodel.tar.gz.

Footnotes:

Tanenbaum, A.S., "Computer Networks", Prentice Hall, 1981, ISBN: 0-13-164699-0. Comment: Section 3.5.3 on pages 128 to 132 provides a very clear description of CRC codes. However, it does not describe table-driven implementation techniques.