Duff's device
什么是 “ 达夫设备” (Duff's Device)? 这是个很棒的迂回循环展开法, 由 Tom Duff 在 Lucasfilm 时所设计。它的 “传统” 形态, 是用来复制多个字节:
register n = (count + 7) / 8; /* count > 0 assumed */ switch (count % 8) { case 0: do { *to = *from++; case 7: *to = *from++; case 6: *to = *from++; case 5: *to = *from++; case 4: *to = *from++; case 3: *to = *from++; case 2: *to = *from++; case 1: *to = *from++; } while (--n > 0); }
这里 count 个字节从 from 指向的数组复制到 to 指向的内存地址 (这是个内存映射的输出寄存器, 这也是为什么它没有被增加)。它把 swtich 语句和复制 8 个字节的循环交织在一起, 从而解决了剩余字节的处理问题 (当 count 不是 8 的倍数时)。相信不相信, 象这样的把 case 标志放在嵌套在 swtich 语句内的模块中是合法的。当他公布这个技巧给 C 的开发者和世界时, Duff 注意 到 C 的 swtich 语法, 特别是 ``跌落" 行为, 一直是被争议的, 而 ``这段代码在争论中形成了某种论据, 但我不清楚是赞成还是反对"。
Anoop写了一个程序进行测试。转贴如下:
/* The Duff device * * An infamous example of how a compiler can accept code that should * be illegal as per the language definition. To add insult to injury, * the illegal code actually runs faster. * * The functions send and send2 accomplish the same goal (copying a * string from one location to another) but send2 manages to screw * with your head and achieve its goal much faster (on most * architectures). * * The answer to the puzzle of how send2 actually works is exposed in * the function send3 (see the comment above the function send3). * * This strange piece of code is named after the programmer who * discovered this 'optimization' technique. * * -- Anoop Sarkar <anoop at cs.sfu.ca> **/ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <sys/time.h> /* pick BUFLEN to be a suitably large number to show the speed difference between send and send2 */ const size_t BUFLEN = 100000000; void send (register char *to, register char *from, register int count) { do *to++ = *from++; while(--count>0); } void send2 (register char *to, register char *from, register int count) { register int n = (count+7)/8; switch (count % 8) { case 0: do { *to++ = *from++; case 7: *to++ = *from++; case 6: *to++ = *from++; case 5: *to++ = *from++; case 4: *to++ = *from++; case 3: *to++ = *from++; case 2: *to++ = *from++; case 1: *to++ = *from++; } while(--n>0); } } /* The answer to the mystery turns out to be simple loop unfolding. * send2 uses the semantics for switch statements in C to provide a * mnemonic for how many assignments should occur within the body of * the do-while loop. * * So why is send2 faster than send on some architectures? The * conditional is a slow instruction to execute on many machine * architectures. * * Try compiling with gcc with and without the -O3 flag. Turning the * optimizer on (using -O3) shows the power of code optimization: send * runs as fast as send2 with the optimizer on. **/ int main (int argc, char **argv) { char *from, *to; int i; struct timeval before, after; from = (char *) malloc(BUFLEN * sizeof(char)); to = (char *) malloc(BUFLEN * sizeof(char)); memset(from, 'a', (BUFLEN * sizeof(char))); printf("array init done/n"); printf("calling send/n"); gettimeofday(&before, NULL); send(to, from, BUFLEN); gettimeofday(&after, NULL); printf("secs=%d/n", after.tv_sec - before.tv_sec); printf("calling send2/n"); gettimeofday(&before, NULL); send2(to, from, BUFLEN); gettimeofday(&after, NULL); printf("secs=%d/n", after.tv_sec - before.tv_sec); if (strcmp(from,to) == 0) { printf("from=to/n"); } else { printf("from!=to/n"); } free(from); free(to); return(0); }