【A GUIDE TO CRC ERROR DETECTION ALGORITHM】 (译文2)

6. A Fully Worked Example 一个完全可行的例子


Having defined CRC arithmetic, we can now frame a CRC calculation as simply a division, because that's all it is! This section fills in the details and gives an example.

定义了CRC算法后,我们现在可以将CRC计算简单地定义为除法,因为这就是全部!本节将详细介绍并给出一个示例。

To perform a CRC calculation, we need to choose a divisor. In maths marketing speak the divisor is called the "generator polynomial" or simply the "polynomial", and is a key parameter of any CRC algorithm. It would probably be more friendly to call the divisor something else, but the poly talk is so deeply ingrained in the field that it would now be confusing to avoid it. As a compromise, we will refer to the CRC polynomial as the "poly". Just think of this number as a sort of parrot. "Hello poly!"

要执行CRC计算,我们需要选择一个除数。在数学营销中,除数被称为“生成多项式”或简称为“多项式”,是任何CRC算法的关键参数。将除数称为其他名称可能更友好,但poly-talk在该领域根深蒂固,现在避免它会令人困惑。作为妥协,我们将CRC多项式称为“poly”。把这个数字想象成一只鹦鹉。“你好,保利!”

You can choose any poly and come up with a CRC algorithm. However, some polys are better than others, and so it is wise to stick with the tried an tested ones. A later section addresses this issue.

你可以选择任何除数,并提出CRC算法。然而,有些除数比其他除数更好,所以明智的做法是坚持使用经过测试的除数。稍后的部分将讨论这个问题。

The width (position of the highest 1 bit) of the poly is very important as it dominates the whole calculation. Typically, widths of 16 or 32 are chosen so as to simplify implementation on modern computers. The width of a poly is the actual bit position of the highest bit. For example, the width of 10011 is 4, not 5. For the purposes of example, we will chose a poly of 10011 (of width W of 4).

除数的宽度(最高1位的位置)非常重要,因为它主导着整个计算。通常,选择16或32的宽度是为了简化现代计算机上的实现。poly的宽度是最高比特的实际比特位置。例如,10011的宽度是4,而不是5。为了示例的目的,我们将选择10011的多边形(宽度W为4)。

Having chosen a poly, we can proceed with the calculation. This is simply a division (in CRC arithmetic) of the message by the poly. The only trick is that W zero bits are appended to the message before the CRC is calculated. Thus we have:

选择了多边形后,我们可以继续计算。这只是消息除以poly(在CRC算法中)。唯一的诀窍是在计算CRC之前,将W个零位附加到消息中。因此,我们有:

Original message                : 1101011011
Poly                            :      10011
Message after appending W zeros : 11010110110000

Now we simply divide the augmented message by the poly using CRC arithmetic. This is the same division as before:

现在我们使用CRC算法将增强消息除以poly。这与之前的划分相同:

            1100001010 = Quotient (nobody cares about the quotient)
       _______________
10011 ) 11010110110000 = Augmented message (1101011011 + 0000)
 =Poly  10011,,.,,..|.
        -----,,.,,|...
         10011,.,,|.|.
         10011,.,,|...
         -----,.,,|.|.
              10110...
              10011.|.
              -----...
               010100.
                10011.
                -----.
                 01110
                 00000
                 -----
                  1110 = Remainder = THE CHECKSUM!!!!

The division yields a quotient, which we throw away, and a remainder, which is the calculated checksum. This ends the calculation.

除法得到一个商,我们把它扔掉,还有一个余数,即计算出的校验和。计算到此结束。

Usually, the checksum is then appended to the message and the result transmitted. In this case the transmission would be: 11010110111110.

通常,校验和会附加到消息中,并传输结果。在这种情况下,传输将是:11010110111110。

At the other end, the receiver can do one of two things:

在另一端,接收器可以做以下两件事之一:

  1. Separate the message and checksum. Calculate the checksum for the message (after appending W zeros) and compare the two checksums.

    将消息和校验和分开。计算消息的校验和(在附加W个零后),并比较两个校验和。

  2. Checksum the whole lot (without appending zeros) and see if it comes out as zero!

    对整批进行校验和(不附加零),看看结果是否为零!

These two options are equivalent. However, in the next section, we will be assuming option b because it is marginally mathematically cleaner.

这两个选项是等效的。然而,在下一节中,我们将假设选项b,因为它在数学上稍微清晰一些。

A summary of the operation of the class of CRC algorithms:

CRC算法类操作总结:

  1. Choose a width W, and a poly G (of width W).

    1.选择宽度W和除数G(宽度W)。

  2. Append W zero bits to the message. Call this M'.

    2.在消息后附加W个零位。叫这个M。

  3. Divide M' by G using CRC arithmetic. The remainder is the checksum.

    3.使用CRC算法将M'除以G。剩下的就是校验和。

That's all there is to it.

7. Choosing A Poly 选择多项式(除数)

Choosing a poly is somewhat of a black art and the reader is referred to [Tanenbaum81] (p.130-132) which has a very clear discussion of this issue. This section merely aims to put the fear of death into anyone who so much as toys with the idea of making up their own poly. If you don't care about why one poly might be better than another and just want to find out about high-speed implementations, choose one of the arithmetically sound polys listed at the end of this section and skip to the next section.

选择多边形有点像一门黑魔法,读者可以参考[Tanenbaum81](第130-132页),其中对这个问题进行了非常清晰的讨论。本节仅旨在将死亡的恐惧灌输给任何一个玩弄自己制造poly想法的人。如果你不关心为什么一个poly可能比另一个好,只是想了解高速实现,请选择本节末尾列出的算术上合理的poly之一,然后跳到下一节。

First note that the transmitted message T is a multiple of the poly. To see this, note that 1) the last W bits of T is the remainder after dividing the augmented (by zeros remember) message by the poly, and 2) addition is the same as subtraction so adding the remainder pushes the value up to the next multiple. Now note that if the transmitted message is corrupted in transmission that we will receive T+E where E is an error vector (and + is CRC addition (i.e. XOR)). Upon receipt of this message, the receiver divides T+E by G. As T mod G is 0, (T+E) mod G = E mod G. Thus, the capacity of the poly we choose to catch particular kinds of errors will be determined by the set of multiples of G, for any corruption E that is a multiple of G will be undetected. Our task then is to find classes of G whose multiples look as little like the kind of line noise (that will be creating the corruptions) as possible. So let's examine the kinds of line noise we can expect.

首先注意,传输的消息T是poly的倍数。要看到这一点,请注意1)T的最后W位是增广(用零记住)消息除以poly后的余数,2)加法与减法相同,因此将余数相加会将值推高到下一个倍数。现在请注意,如果传输的消息在传输中损坏,我们将收到T+E,其中E是错误向量(+是CRC加法(即XOR))。收到此消息后,接收器将T+E除以G。由于T mod G为0,(T+E)mod G=E mod G。因此,我们选择捕获特定类型错误的poly的容量将由G的倍数集决定,因为任何G倍数的损坏E都不会被检测到。然后,我们的任务是找到G的类,其倍数看起来尽可能少地像线噪声(这将导致破坏)。那么,让我们来看看我们可以预期的线路噪声类型。

SINGLE BIT ERRORS: A single bit error means E=1000...0000. We can ensure that this class of error is always detected by making sure that G has at least two bits set to 1. Any multiple of G will be constructed using shifting and adding and it is impossible to construct a value with a single bit by shifting an adding a single value with more than one bit set, as the two end bits will always persist.

单比特错误:单比特错误意味着E=1000…0000。我们可以通过确保G至少有两个比特设置为1来确保始终检测到这类错误。G的任何倍数都将使用移位和加法构造,并且不可能通过移位和加法来构造具有单个比特的值,因为两个结束比特将始终存在。

TWO-BIT ERRORS: To detect all errors of the form 100...000100...000 (i.e. E contains two 1 bits) choose a G that does not have multiples that are 11, 101, 1001, 10001, 100001, etc. It is not clear to me how one goes about doing this (I don't have the pure maths background), but Tanenbaum assures us that such G do exist, and cites G with 1 bits (15,14,1) turned on as an example of one G that won't divide anything less than 1...1 where ... is 32767 zeros.

双位错误:要检测100…000100…000(即e包含两个1位)形式的所有错误,请选择一个没有11、101、1001、10001、100001等倍数的G。我不清楚如何做到这一点(我没有纯数学背景),但塔南鲍姆向我们保证这样的G确实存在,并引用了打开1位(15,14,1)的G作为一个G的例子,该G不会除以小于1…1的任何东西,其中。。。是32767个零。

ERRORS WITH AN ODD NUMBER OF BITS: We can catch all corruptions where E has an odd number of bits by choosing a G that has an even number of bits. To see this, note that 1) CRC multiplication is simply XORing a constant value into a register at various offsets, 2) XORing is simply a bit-flip operation, and 3) if you XOR a value with an even number of bits into a register, the oddness of the number of 1 bits in the register remains invariant. Example: Starting with E=111, attempt to flip all three bits to zero by the repeated application of XORing in 11 at one of the two offsets (i.e. "E=E XOR 011" and "E=E XOR 110") This is nearly isomorphic to the "glass tumblers" party puzzle where you challenge someone to flip three tumblers by the repeated application of the operation of flipping any two. Most of the popular CRC polys contain an even number of 1 bits. (Note: Tanenbaum states more specifically that all errors with an odd number of bits can be caught by making G a multiple of 11).

具有奇数比特的错误:通过选择具有偶数比特的G,我们可以捕获E具有奇数比特时的所有损坏。要看到这一点,请注意1)CRC乘法只是将一个常数值以各种偏移量异或到寄存器中,2)XOR只是一个位翻转操作,3)如果你将一个偶数位的值异或到一个寄存器中,寄存器中1位的奇数保持不变。示例:从E=111开始,尝试通过在两个偏移量之一(即“E=E XOR 011”和“E=E XOR 110”)重复应用11中的XOR将所有三个位翻转为零。这几乎与“玻璃制玻璃杯”派对谜题同构,在这个谜题中,你挑战某人通过重复应用翻转任意两个玻璃杯的操作来翻转三个玻璃杯。大多数流行的CRC polys包含偶数个1比特。(注:Tanenbaum更具体地说,所有奇数比特的错误都可以通过将G设为11的倍数来捕获)。

BURST ERRORS: A burst error looks like E=000...000111...11110000...00. That is, E consists of all zeros except for a run of 1s somewhere inside. This can be recast as E=(10000...00)(1111111...111) where there are z zeros in the LEFT part and n ones in the RIGHT part. To catch errors of this kind, we simply set the lowest bit of G to 1. Doing this ensures that LEFT cannot be a factor of G. Then, so long as G is wider than RIGHT, the error will be detected. See Tanenbaum for a clearer explanation of this; I'm a little fuzzy on this one. Note: Tanenbaum asserts that the probability of a burst of length greater than W getting through is (0.5)^W.

突发错误:突发错误看起来像E=000…000111…11110000…00。也就是说,E由全零组成,除了内部某处的1。这可以改写为E=(10000…00)(1111111…111),其中左部分有z个零,右部分有n个一。为了捕捉这种错误,我们只需将G的最低位设置为1。这样做可以确保LEFT不能是G的因子。然后,只要G比RIGHT宽,就会检测到错误。请参阅Tanenbaum以获得更清晰的解释;我对这个有点模糊。注:Tanenbaum断言,长度大于W的突发通过的概率为(0.5)^W。

That concludes the section on the fine art of selecting polys.

以上就是关于选择polys的美术部分。

Some popular polys are:

16 bits: (16,12,5,0)                                [X25 standard]
         (16,15,2,0)                                ["CRC-16"]
32 bits: (32,26,23,22,16,12,11,10,8,7,5,4,2,1,0)    [Ethernet]

8. A Straightforward CRC Implementation

That's the end of the theory; now we turn to implementations. To start with, we examine an absolutely straight-down-the-middle boring straightforward low-speed implementation that doesn't use any speed tricks at all. We'll then transform that program progessively until we end up with the compact table-driven code we all know and love and which some of us would like to understand.

理论到此为止;现在我们转向实现。首先,我们研究一个绝对直接的、无聊的、直接的低速实现,它根本不使用任何速度技巧。然后,我们将逐步转换该程序,直到我们最终得到我们都知道和喜欢的紧凑的表驱动代码,我们中的一些人也想理解。

To implement a CRC algorithm all we have to do is implement CRC division. There are two reasons why we cannot simply use the divide instruction of whatever machine we are on. The first is that we have to do the divide in CRC arithmetic. The second is that the dividend might be ten megabytes long, and todays processors do not have registers that big.

要实现CRC算法,我们所要做的就是实现CRC除法。我们不能简单地使用任何机器的除法指令有两个原因。第一个原因是我们必须在CRC算法中进行除法运算。第二,股息可能是10兆字节长,而今天的处理器没有那么大的寄存器。

So to implement CRC division, we have to feed the message through a division register. At this point, we have to be absolutely precise about the message data. In all the following examples the message will be considered to be a stream of bytes (each of 8 bits) with bit 7 of each byte being considered to be the most significant bit (MSB). The bit stream formed from these bytes will be the bit stream with the MSB (bit 7) of the first byte first, going down to bit 0 of the first byte, and then the MSB of the second byte and so on.

因此,为了实现CRC分割,我们必须通过分割寄存器来馈送消息。此时,我们必须对消息数据绝对准确。在以下所有示例中,消息将被视为字节流(每个字节为8位),每个字节的第7位被视为最高有效位(MSB)。由这些字节形成的比特流将是首先具有第一个字节的MSB(位7)的比特流,向下到第一个字节中的位0,然后是第二个字节的最高位,以此类推。

With this in mind, we can sketch an implementation of the CRC division. For the purposes of example, consider a poly with W=4 and the poly=10111. Then, the perform the division, we need to use a 4-bit register:

考虑到这一点,我们可以勾勒出CRC划分的实现。为了示例的目的,考虑W=4且poly=10111的poly。然后,执行除法时,我们需要使用一个4位寄存器:

              3   2   1   0   Bits
            +---+---+---+---+
   Pop! <-- |   |   |   |   | <----- Augmented message
            +---+---+---+---+

         1    0   1   1   1   = The Poly

(Reminder: The augmented message is the message followed by W zero bits.)

提醒:增强消息是后面跟着W零位的消息。

To perform the division perform the following:

Load the register with zero bits. ; 寄存器清零
Augment the message by appending W zero bits to the end of it. ; 通过在消息末尾附加W零位来增强消息。
While (more message bits)
    Begin
    Shift the register left by one bit, reading the next bit of the
        augmented message into register bit position 0.
        ; 左移寄存器1位,读取增强信息的下一位到寄存器的bit0
    If (a 1 bit popped out of the register during step 3)
        ; 如果bit 3的1左移时发生溢出, 那么寄存器的值异或poly的值
        Register = Register XOR Poly.
    End
The register now contains the remainder.

(Note: In practice, the IF condition can be tested by testing the top bit of R before performing the shift.)

(注:在实践中,可以通过在执行移位之前测试R的高位来测试IF条件。)

We will call this algorithm "SIMPLE".

我们将把这个算法称为“SIMPLE”。

This might look a bit messy, but all we are really doing is "subtracting" various powers (i.e. shiftings) of the poly from the message until there is nothing left but the remainder. Study the manual examples of long division if you don't understand this.

这可能看起来有点混乱,但我们真正要做的是从消息中“减去”多边形的各种幂(即移位),直到只剩下剩余部分。如果你不明白这一点,请研究长除法的手动示例。

It should be clear that the above algorithm will work for any width W.

应该清楚的是,上述算法适用于任何宽度W。

9. A Table-Driven Implementation

The SIMPLE algorithm above is a good starting point because it corresponds directly to the theory presented so far, and because it is so SIMPLE. However, because it operates at the bit level, it is rather awkward to code (even in C), and inefficient to execute (it has to loop once for each bit). To speed it up, we need to find a way to enable the algorithm to process the message in units larger than one bit.

上面SIMPLE算缝是一个好的起点,因为它直接对应到目前提出的理论,而且因为它简单。然而因为它是在位级别操作的,因此编程(即使是用C语言)和执行效率(它必须每一个位都循环一次)也相当拙笨。为了加速,我们需要发现一个能够让算法以大于单位处理信息的方法。

Candidate quantities are nibbles (4 bits), bytes (8 bits), words (16 bits) and longwords (32 bits) and higher if we can achieve it. Of these, 4 bits is best avoided because it does not correspond to a byte boundary. At the very least, any speedup should allow us to operate at byte boundaries, and in fact most of the table driven algorithms operate a byte at a time.

候选数量是半字节(4位)、字节(8位)、字(16位)和长字(32位),或者更高如果我们能实现的话。其中,最好避免4位,因为它不对应于字节边界。至少,任何加速都应该允许我们在字节边界上操作,事实上,大多数表驱动算法一次操作一个字节。

For the purposes of discussion, let us switch from a 4-bit poly to a 32-bit one. Our register looks much the same, except the boxes represent bytes instead of bits, and the Poly is 33 bits (one implicit 1 bit at the top and 32 "active" bits) (W=32).

为了便于讨论,让我们从4位除数poly切换到32位除数poly。我们的寄存器看起来几乎相同,除了方框表示字节而不是位,Poly除数是33位(顶部有一个隐式1位和32个“活动”位)(W=32)。

               3    2    1    0   Bytes
            +----+----+----+----+
   Pop! <-- |    |    |    |    | <----- Augmented message
            +----+----+----+----+

           1<------32 bits------>

The SIMPLE algorithm is still applicable. Let us examine what it does. Imagine that the SIMPLE algorithm is in full swing and consider the top 8 bits of the 32-bit register (byte 3) to have the values:

SIMPLE算法仍然适用。让我们来看看它的作用。想象一下,SIMPLE算法正在全面展开,并考虑32位寄存器(字节3)的最高8位具有以下值:

t7 t6 t5 t4 t3 t2 t1 t0

In the next iteration of SIMPLE, t7 will determine whether the Poly will be XORed into the entire register. If t7=1, this will happen, otherwise it will not.

在SIMPLE算法的下一次迭代中, t7将会决定除数poly是否会被异或到整个寄存器中。如果t7=1,则会发生这种情况,否则不会发生。

Suppose that the top 8 bits of the poly are g7 g6.. g0, then after the next iteration, the top byte will be:

假设除数poly的前8位是g7 g6..g0,那么在下一次迭代后,前8位字节将是:

  t6 t5 t4 t3 t2 t1 t0 ??
+ t7 * (g7 g6 g5 g4 g3 g2 g1 g0)    [Reminder: + is XOR]

The NEW top bit (that will control what happens in the next iteration) now has the value t6 + t7*g7.

新的顶部位(将控制下一次迭代中发生的事情)现在具有值t6+t7*g7。

The important thing to notice here is that from an informational point of view, all the information required to calculate the NEW top bit was present in the top TWO bits of the original top byte.

这里要注意的重要一点是,从信息的角度来看,计算新的最高位所有需要的信息都存在于原始顶部字节的最高2位中。

Similarly, the NEXT top bit can be calculated in advance SOLELY from the top THREE bits t7, t6, and t5. In fact, in general, the value of the top bit in the register in k iterations can be calculated from the top k bits of the register. Let us take this for granted for a moment.

类似地,下一个顶部位可以预先计算,从顶部三个比特t7、t6和t5。事实上,一般来说,k次迭代后寄存器的最高位的值可以从寄存器的顶部k位的值计算出来。让我们暂时将此视为理所当然。

Consider for a moment that we use the top 8 bits of the register to calculate the value of the top bit of the register during the next 8 iterations. Suppose that we drive the next 8 iterations using the calculated values (which we could perhaps store in a single byte register and shift out to pick off each bit). Then we note three things:

考虑一下,我们使用寄存器的高8位来计算寄存器在接下来的8次迭代后的最高位的值。假设我们使用计算的值(我们可能可以将其存储在一个单字节寄存器中,并移出以拾取每个比特)驱动接下来的8次迭代。然后我们注意到三件事:

  • The top byte of the register now doesn't matter. No matter how many times and at what offset the poly is XORed to the top 8 bits, they will all be shifted out the right hand side during the next 8 iterations anyway.

    +寄存器的顶部字节现在不重要了。无论除数poly被异或到高8位的次数和偏移量是多少,在接下来的8次迭代中,它们都会被移出右侧。

  • The remaining bits will be shifted left one position and the rightmost byte of the register will be shifted in the next byte

    +剩余的位将向左移动一个位置,寄存器的最右侧字节将在下一个字节中移动

  • While all this is going on, the register will be subjected to a series of XOR's in accordance with the bits of the pre-calculated control byte.

    +在所有这一切进行的同时,寄存器将根据预先计算的控制字节的比特进行一系列XOR运算。

Now consider the effect of XORing in a constant value at various offsets to a register. For example:

现在考虑XORing在寄存器的不同偏移量下对常数值的影响。例如:

   0100010  Register
   ...0110  XOR this
   ..0110.  XOR this
   0110...  XOR this
   -------
   0011000
   -------

The point of this is that you can XOR constant values into a register to your heart's delight, and in the end, there will exist a value which when XORed in with the original register will have the same effect as all the other XORs.

关键在于,你可以随心所欲地将常数值XOR到一个寄存器中,最后,会有一个值,当与原始寄存器XOR时,它将与所有其他XOR具有相同的效果。

Perhaps you can see the solution now. Putting all the pieces together we have an algorithm that goes like this:

也许你现在可以看到解决方案了。将所有部分放在一起,我们有一个算法,如下所示:

While (augmented message is not exhausted) ; (增强消息未用尽)
  Begin
  Examine the top byte of the register; 检查寄存器的顶部字节
  Calculate the control byte from the top byte of the register; 从寄存器的顶部字节计算控制字节
  Sum all the Polys at various offsets that are to be XORed into; 将所有在不同偏移量下的Polys相加,这些偏移量将被异或成
     the register in accordance with the control byte; 根据控制字节的寄存器
  Shift the register left by one byte, reading a new message byte; 将寄存器向左移动一个字节,读取新的消息字节
     into the rightmost byte of the register;  写入寄存器的最右侧字节
  XOR the summed polys to the register; 将求和的多边形与寄存器进行XOR运算
  End

As it stands this is not much better than the SIMPLE algorithm. However, it turns out that most of the calculation can be precomputed and assembled into a table. As a result, the above algorithm can be reduced to:

就目前而言,这并不比SIMPLE算法好多少。然而,事实证明,大多数计算都可以预先计算并组装到表中。因此,上述算法可以简化为:

    使用32位寄存器计算8位的值,有点浪费
While (augmented message is not exhaused); (增强信息未耗尽)
  Begin
  Top = top_byte(Register);
  Register = (Register << 8) | next_augmessage_byte;
  Register = Register XOR precomputed_table[Top];
  End

There! If you understand this, you've grasped the main idea of table-driven CRC algorithms. The above is a very efficient algorithm requiring just a shift, and OR, an XOR, and a table lookup per byte. Graphically, it looks like this:

那里!如果你理解了这一点,你就掌握了表驱动CRC算法的主要思想。以上是一种非常有效的算法,只需要移位、OR、XOR和每字节的表查找。从图形上看,它看起来像这样:

               3    2    1    0   Bytes
            +----+----+----+----+
     +-----<|    |    |    |    | <----- Augmented message
     |      +----+----+----+----+
     |                ^
     |                |
     |               XOR
     |                |
     |     0+----+----+----+----+       Algorithm
     v      +----+----+----+----+       ---------
     |      +----+----+----+----+       1. Shift the register left by
     |      +----+----+----+----+          one byte, reading in a new
     |      +----+----+----+----+          message byte.
     |      +----+----+----+----+       2. Use the top byte just rotated
     |      +----+----+----+----+          out of the register to index
     +----->+----+----+----+----+          the table of 256 32-bit values.
            +----+----+----+----+       3. XOR the table value into the
            +----+----+----+----+          register.
            +----+----+----+----+       4. Goto 1 iff more augmented
            +----+----+----+----+          message bytes.
         255+----+----+----+----+

In C, the algorithm main loop looks like this:

r=0;
while (len--)
    {
    byte t = (r >> 24) & 0xFF;
    r = (r << 8) | *p++;
    r^=table[t];
    }

where len is the length of the augmented message in bytes, p points to the augmented message, r is the register, t is a temporary, and table is the computed table. This code can be made even more unreadable as follows:

其中len是增强消息的长度(以字节为单位),p指向增强消息,r是寄存器,t是临时的,table是计算表。这段代码可能会变得更加难以阅读,如下所示:

r=0; while (len--) r = ((r << 8) | *p++) ^ table[(r >> 24) & 0xFF];

This is a very clean, efficient loop, although not a very obvious one to the casual observer not versed in CRC theory. We will call this the TABLE algorithm.

这是一个非常干净、高效的循环,尽管对于不熟悉CRC理论的普通观察者来说,这不是一个非常明显的循环。我们称之为TABLE算法。

10. A Slightly Mangled Table-Driven Implementation 略微混乱的表驱动实现

Despite the terse beauty of the line

尽管这行简洁优美

r=0; while (len--) r = ((r << 8) | *p++) ^ table[(r >> 24) & 0xFF];

those optimizing hackers couldn't leave it alone. The trouble, you see, is that this loop operates upon the AUGMENTED message and in order to use this code, you have to append W/8 zero bytes to the end of the message before pointing p at it. Depending on the run-time environment, this may or may not be a problem; if the block of data was handed to us by some other code, it could be a BIG problem. One alternative is simply to append the following line after the above loop, once for each zero byte:

那些优化黑客不能放过它。您可以看到,问题在于此循环对AUGMENTED消息进行操作,为了使用此代码,您必须在将p指向消息末尾之前将W/8零字节附加到消息末尾。根据运行时环境,这可能是也可能不是问题;如果数据块是由其他代码传递给我们的,这可能是一个大问题。一种替代方法是在上述循环后附加以下行,每个零字节一次:

  for (i=0; i<W/8; i++) r = (r << 8) ^ table[(r >> 24) & 0xFF];

This looks like a sane enough solution to me. However, at the further expense of clarity (which, you must admit, is already a pretty scare commodity in this code) we can reorganize this small loop further so as to avoid the need to either augment the message with zero bytes, or to explicitly process zero bytes at the end as above. To explain the optimization, we return to the processing diagram given earlier.

在我看来,这是一个足够理智的解决方案。然而,以进一步降低清晰度为代价(你必须承认,这在这段代码中已经是一个相当可怕的商品),我们可以进一步重组这个小循环,以避免需要用零字节来增强消息,或者如上所述在末尾显式处理零字节。为了解释优化,我们回到前面给出的处理图。

               3    2    1    0   Bytes
            +----+----+----+----+
     +-----<|    |    |    |    | <----- Augmented message
     |      +----+----+----+----+
     |                ^
     |                |
     |               XOR
     |                |
     |     0+----+----+----+----+       Algorithm
     v      +----+----+----+----+       ---------
     |      +----+----+----+----+       1. Shift the register left by
     |      +----+----+----+----+          one byte, reading in a new
     |      +----+----+----+----+          message byte.
     |      +----+----+----+----+       2. Use the top byte just rotated
     |      +----+----+----+----+          out of the register to index
     +----->+----+----+----+----+          the table of 256 32-bit values.
            +----+----+----+----+       3. XOR the table value into the
            +----+----+----+----+          register.
            +----+----+----+----+       4. Goto 1 iff more augmented
            +----+----+----+----+          message bytes.
         255+----+----+----+----+

Now, note the following facts:

现在,请注意以下事实:

HEAD: If the initial value of the register is zero, the first four iterations of the loop will have the sole effect of shifting in the first four bytes of the message from the right. This is because the first 32 control bits are all zero and so nothing is XORed into the register. Even if the initial value is not zero, the first 4 byte iterations of the algorithm will have the sole effect of shifting the first 4 bytes of the message into the register and then XORing them with some constant value (that is a function of the initial value of the register).

HEAD:如果寄存器的初始值为零,则循环的前四次迭代将具有从右侧移动消息前四个字节的唯一效果。这是因为前32个控制位都是零,所以没有任何东西被异或到寄存器中。即使初始值不为零,算法的前4字节迭代也将具有将消息的前4个字节移位到寄存器中,然后用某个常数值(即寄存器初始值的函数)对其进行异或运算的唯一效果。

对于32位寄存器来说前4次的迭代运算相当于是赋值,即通过4次迭代运算把消息填充到32位寄存器中;在这个过程中寄存器的最高位一直是0,所以不会进行xor运算。

TAIL: The W/8=4 augmented zero bytes that appear at the end of the message will be pushed into the register from the right as all the other bytes are, but their values (0) will have no effect whatsoever on the register because 1) XORing with zero does not change the target byte, and 2) the four bytes are never propagated out the left side of the register where their zeroness might have some sort of influence. Thus, the sole function of the W/4 augmented zero bytes is to drive the calculation for another W/4 byte cycles so that the end of the REAL data passes all the way through the register.

TAIL:出现在消息末尾的W/8=4增强零字节将像所有其他字节一样从右侧推入寄存器,但它们的值(0)对寄存器没有任何影响,因为1)与零异或不会改变目标字节,2)这四个字节永远不会传播到寄存器的左侧,在那里它们的零度可能会产生某种影响。因此,W/8=4增强零字节的唯一功能是驱动另一个W/8=4字节周期的计算,以便REAL数据的末尾一直通过寄存器。

对于末尾补0的4个字节,实际上不会对寄存器的值造成影响,因为0异或任何值都是其本身(A xor 0 = A),另外这4个字节的0不会移位到寄存器的最高位(或者理解为最后一次移动到了,但是也结束运算了),其功能其实是把余数提取出来;

These facts, combined with the XOR property

这些事实,再加上XOR属性

(A xor B) xor C = A xor (B xor C)

mean that message bytes need not actually travel through the W/8=4 bytes of the register. Instead, they can be XORed into the top byte just before it is used to index the lookup table. This leads to the following modified version of the algorithm.

这意味着消息字节实际上不需要通过寄存器的4字节。相反,它们可以在用于索引查找表之前被异或到顶部字节中。这导致了以下算法的修改版本。

  1. 假设有个8位的数据,用32位的寄存器,实际上只需要把这个字节移动到左侧最高字节进行运算即可;从这个角度看,我们可以把新来的消息
     +-----<Message (non augmented)
     |
     v         3    2    1    0   Bytes
     |      +----+----+----+----+
    XOR----<|    |    |    |    |
     |      +----+----+----+----+
     |                ^
     |                |
     |               XOR
     |                |
     |     0+----+----+----+----+       Algorithm
     v      +----+----+----+----+       ---------
     |      +----+----+----+----+       1. Shift the register left by
     |      +----+----+----+----+          one byte, reading in a new
     |      +----+----+----+----+          message byte.
     |      +----+----+----+----+       2. XOR the top byte just rotated
     |      +----+----+----+----+          out of the register with the
     +----->+----+----+----+----+          next message byte to yield an
            +----+----+----+----+          index into the table ([0,255]).
            +----+----+----+----+       3. XOR the table value into the
            +----+----+----+----+          register.
            +----+----+----+----+       4. Goto 1 iff more augmented
         255+----+----+----+----+          message bytes.

Note: The initial register value for this algorithm must be the initial value of the register for the previous algorithm fed through the table four times. Note: The table is such that if the previous algorithm used 0, the new algorithm will too.

注意:此算法的初始寄存器值必须是四次通过表馈送的前一个算法的寄存器的初始值。注意:该表是这样的,如果之前的算法使用0,则新算法也将使用0。

This is an IDENTICAL algorithm and will yield IDENTICAL results. The C code looks something like this:

这是一个IDENTICAL算法,将产生IDENTICAL结果。C代码看起来像这样:

// 改动前:
r=0; while (len--) r = ((r << 8) | *p++) ^ table[(r >> 24) & 0xFF];

for (i=0; i<W/8; i++) r = (r << 8) ^ table[(r >> 24) & 0xFF];

// 改动后:
r=0; while (len--) r = (r<<8) ^ table[(r >> 24) ^ *p++];

and THIS is the code that you are likely to find inside current table-driven CRC implementations. Some FF masks might have to be ANDed in here and there for portability's sake, but basically, the above loop is IT. We will call this the DIRECT TABLE ALGORITHM.

这是您可能会在当前表驱动的CRC实现中找到的代码。为了便于移植,一些FF掩码可能需要在这里和那里进行and运算,但基本上,上面的循环就是它。我们将称之为直接表算法。

During the process of trying to understand all this stuff, I managed to derive the SIMPLE algorithm and the table-driven version derived from that. However, when I compared my code with the code found in real-implementations, I was totally bamboozled as to why the bytes were being XORed in at the wrong end of the register! It took quite a while before I figured out that theirs and my algorithms were actually the same. Part of why I am writing this document is that, while the link between division and my earlier table-driven code is vaguely apparent, any such link is fairly well erased when you start pumping bytes in at the "wrong end" of the register. It looks all wrong!

在试图理解所有这些东西的过程中,我设法推导出了SIMPLE算法和由此衍生出的表驱动版本。然而,当我将我的代码与实际实现中的代码进行比较时,我完全不明白为什么字节会在寄存器的错误端被异或!我花了很长时间才发现他们的算法和我的算法实际上是一样的。我写这篇文档的部分原因是,虽然除法和我早期的表驱动代码之间的联系是模糊的,但当你开始在寄存器的“错误端”注入字节时,任何这样的联系都会被很好地抹去。看起来完全不对!

If you've got this far, you not only understand the theory, the practice, the optimized practice, but you also understand the real code you are likely to run into. Could get any more complicated? Yes it can.

如果你已经做到了这一点,你不仅了解了理论、实践和优化实践,还了解了你可能会遇到的真实代码。还能变得更复杂吗?是的,它可以。

posted @ 2024-08-15 11:28  QIYUEXIN  阅读(10)  评论(0编辑  收藏  举报