读书笔记：Writing Solid Code (4)

读书笔记：Writing Solid Code (1):
http://www.cnblogs.com/soroman/archive/2007/08/06/845465.html
读书笔记：Writing Solid Code (2):
http://www.cnblogs.com/soroman/archive/2007/12/22/1010870.html
读书笔记：Writing Solid Code (3):
http://www.cnblogs.com/soroman/archive/2007/12/25/1014142.html

Go on writting solid code...

-----------------------------------
Chapter 6 Risky Business
危险的事情
-----------------------------------

Summary:
Given the numerous implementation possibilities for a given function, it
should come as no surprise that some implementations will be more errorprone
than others. The key to writing robust functions is to exchange risky
algorithms and language idioms for alternatives that have proven to be
comparably efficient yet much safer. At one extreme this can mean using
unambiguous data types; at the other it can mean tossing out an entire design
simply because it would be difficult, or impossible, to test.

概述：
如果一个函数有很多实现方案的可能，那么一定是其中一些实现比另一些出错的可能性更大。实现稳定的函数的关键是不用带风险的算法和语言习惯，而是去用那些同样高效但是很安全的方案。在一个极端下，这意味着使用无歧义的数据类型，在另一个极端下，这意味着必须扔掉全部的设计，仅仅是因为它是很难或者不可能被测试的。

Guidlines:
6.1.Use well-defined data types.
使用定义明确的数据类型。

【注】当ANSI委员会看到C运行在各种平台上，它们看到C不是一个象用户认为的那样的一个portable的语言。并仅仅是因为C标准库在不同的系
统上不一样，还因为预处理器和语言本身。ANSI标准委员会对大部分方面作了标准化，但是却忽略了基本数据类型。ANSI没有具体的定义int,char,long,而是留下具体实现给编译器厂商。结果，有些ANSI标准的编译器支持32位整型和带符号的字符型，另外一些编译器可能支持16位整型和无符号char型。
那么看看下面的代码：

char ch;
2

ch = 0xFF;
3

if (ch == 0xFF)
4

{
5

}

上面的判断到底是true or false,你永远也不知道，因为这依赖于编译器的实现。如果是无符号char型，那么结果是true，如果是带符号char型，那么结果就是false。还有象类似：int reg ＝ 3;虽然reg被定义成int，reg可以是signed或者unsigned，还是取决于你的编译器。你必须使用signed int或者unsigned int。
short是多大？int呢？long呢？ANSI标准并没有说，而是留给编译器的作者去决定。标准没有规定的原因是考虑到兼容以前的代码。

6.2.Always ask, "Can this variable or expression over- or underflow?"
时刻问自己：”这个变量或者表达式溢出了吗？”

【注】看看下面的代码：

# include <limits.h> /*Pull in UCHARMAX. */
2

char chToLower[UCHARMAX+1];
3

void Bui1dToLowerTable(void) /* ASCII version */
4

{
5

unsigned char ch;
6

/* First set every character to itself. */
7

for(ch = 0; ch <= UCHARMAX; ch++)
8

chToLower[ch] = ch:
9

/* Now poke lowercase letters into the uppercase slots . */
10

for (ch = 'A' ; ch <= 'Z'; ch++ )
11

chToLower[ch] = ch + 'a' - 'A' :
12

}

这段代码的问题在于ch=UCHARMAX的时候，再加1，会导致等于0，所以造成死循环。上面是Overflow的问题，同样存在Underflow的问题：

void *memchr(void *pv, unsigned char ch, size_t size)
2

{
3

unsigned char *pch = (unsigned char *)pv;
4

while (--size >= 0)
5

{
6

if(*pch == ch)
7

return (pch);
8

pch++;
9

}
10

return(NULL);
11

}

如果size是unsigned的话，当size=0后再进行递减操作的话，那么就会翻转成unsigned型的size_t的最大值。好消息是如果你按照我在第四章所建议的那样去检查的话，你就可以发现这些溢出bugs。

6.3.Implement "the task" just once.
只实现“任务”一次。

【注】设想为了表示一个文档处理程序中的层次状的Window结构,设计如下结构体:

typedef struct WINDOW
2

{
3

struct WINDOW *pwndChild; /* NULL if no children */
4

struct WINDOW *pwndSibling; /* NULL if no brothers or sisters */
5

char *strWndTitle;
6

} window; /* Naming: wnd, *pwnd */

现在表示所有的Window可以由该结构体节点构成的二叉树来表示，注意其中的Root节点实际上只有成员pwndChild是有意义的－－指向所有top-level的windows，因为没有兄弟姐妹和Title，而且Root是不能move,hide,or delete的，所以有人将Root节点简化成其中一个成员，即用指针pwndRootChildren来表示，这样至少可以减少空间。于是AddChild（添加一个子window到已存在的window节点下）的实现如下：

/* pwndRootChildren is the pointer to the list of top-level
2

* windows,such as the menu bar and the main document windows.
3

*/
4

static window *pwndRootChildren = NULL;
5

void AddChild(window *pwndParent, window *pwndNewBorn)
7

{
8

/* New windows may have c h i l d r e n but not s i b l i n g s

*/
9

ASSERT(pwndNewB0rn->pwndSibling == NULL);
10

if(pwndparent == NULL)
11

{
12

/* Add window t o the t o p - l e v e l root l i s t . */
13

pwndNewBorn->pwndSibling = pwndRootChildren;
14

pwndRootChildren = pwndNewBorn;
15

}
16

else
17

{
18

/* If Parent's first child. start a new sibling chain;
19

* otherwise, add child to the end of the existing
20

* sibling chain.
21

*/
22

if (pwndparent->pwndChild == NULL)
23

pwndparent->pwndChild = pwndNewBorn;
24

else
25

{
26

window *pwnd = pwndparent->pwndChild;
27

while (pwnd->pwndSibling != NULL)
28

pwnd = pwnd->pwndSibling;
29

pwnd->pwndSibling = pwndNewBorn;
30

}
31

}
32

}

上述代码至少违背了编写bug-free代码的三个原则：
1.Don't accept special purpose arguments such as the NULL pointer.
2.Implement your design, not something that approximates it.
3.The third principle is new: Strive to make every function perform its task exactly one time

前俩个原则前面已经说过，第三个是新的。上面的代码含有三个不同的插入路径，直觉告诉我们越多的路径越可能导致bugs。这相当于一个任务需要三次“实现”才能完成。尽量用一次“实现”搞定一个任务。上面的代码的改进可以是将Root当作普通的节点，这样逻辑就比较简单了，不用处理Root节点这一特殊情况。代码如下：

/* pwndDisplay points to the root-level window. which is
2

* a1located during program initialization .*/
3

window *pwndDisplay = NULL;
4

void AddChild(window *pwndParent, window *pwndNewBorn)
6

{
7

/* New windows may have children but not siblings

*/
8

ASSERT(pwndNewBorn->pwndSibling == NULL);
9

/* If Parent's first child, start a new sibling chain;
10

* otherwise, add child to the end of the existing sibling chain.*/
11

if(pwndparent->pwndChild == NULL)
13

pwndparent->pwndChild = pwndNewBorn;
14

else
15

{
16

window *pwnd = pwndparent->pwndChild;
17

while (pwnd->pwndSibiing != NULL)
18

pwnd = pwnd->pwndSibling;
19

pwnd->pwndSibling = pwndNewBorn;
20

}
21

}

6.4.Get rid of extraneous if statements.
处理无关的if语句。

【注】上面改进的AddChild虽然比最开始的要好，但是它还是要做“两次”，还是包含两个路径。实际上，上面的算法都是以Window为中心的，如果换成以Pointer为中心，那么就有：

void AddChild(window *pwndParent, window *pwndNewBorn)
2

{
3

window **ppwndNext;
4

/* New windows may have c h i l d r e n but not s i b l i n g s

*/
5

ASSERT(pwndNewBorn->pwndSibling == NULL):
6

/* Traverse the sibling chain using a pointer - centric
7

* algorithm . We set ppwndNext to point at
8

* pwndparent->pwndChild since the latter pointer
9

* is the first "next sibling pointer" of the list .*/
10

ppwndNext = &pwndParent->pwndChiId;
11

while(*ppwndNext != NULL)
12

ppwndNext = &(*ppwndNext)->pwndSibling;
13

*ppwndNext = pwndNewBorn;
15

}

这样，你就无需处理特殊路径了，减少了出错的可能。

6.5.Avoid using nested ?: operators.
避免使用嵌套的?:操作符。

【注】看下下面这段代码：

/* uCycleCheckBox -- return the next state for a checkbox.
2

*
3

* Given the current setting .uCur, return what the next
4

* checkbox state should be. This function handles both
5

* two-state checkboxes that toggle between 0 and 1, and
6

* three-state checkboxes that cycle through 2, 3, 4, 2,

*/
8

unsigned uCycleCheckBox(unsigned uCur)
9

{
10

return ((uCur<=l) ? (uCur?O:l) : (uCur==4)?2:(uCur+l)):
11

}

其中用到了嵌入的?:操作符。相当于以下用if语句的代码：

unsigned uCycleCheckBox(unsigned uCur)
2

{
3

unsigned uRet ;
4

if(uCur <= 1)
5

{
6

if (uCur != 0) /* Handle the 0, 1, 0,. . . cycle. */
7

uRet = 0;
8

else
9

uRet = 1;
10

}
11

else
12

{
13

if (uCur == 4 ) /* Handle the 2, 3, 4. 2.

cycle. */
14

uRet = 2;
15

else
16

uRet = uCur+l;
17

}
18

return (uRet);
19

}

如果你的编译器优化的话，可能变成以下：

unsigned uCycleCheckBox(unsigned uCur)
2

{
3

unsigned uRet ;
4

if(uCur <= 1)
5

{
6

uRet = 0; /* Handle the 0, 1, 0.

cycle. */
7

if (uCur == 0)
8

uRet = 1;
9

}
10

else
11

{
12

uRet = 2; /* Handle the 2 , 3, 4. 2.

cycle. */
13

if (uCur != 4)
14

uRet = uCur+l;
15

}
16

return(uRet);
17

}

以上单个版本的代码不好理解，路径多。直接的实现如下：

unsigned uCycleCheckBox(unsigned uCur)
2

{
3

ASSERT(uCur >= 0 && uCur <= 4);
4

if(uCur == 1) /* Time t o r e s t a r t the f i r s t cycle? */
5

return ( 0 ) ;
6

if(uCur == 4) /* What about the second one? */
7

return ( 2 ) ;
8

return (uCur+l); /* Nope, nothing special t h i s time. */
9

}

或者使用表来解决：

unsigned uCycleCheckBox(unsigned uCur)
2

{
3

static const unsigned uNextState[] = { 1, 0, 3, 4, 2 };
4

ASSERT(uCur >= 0 && uCur <= 4);
5

return (uNextState[uCur]);
6

}

6.6.Handle your special cases just once.
只处理你的特殊情形一次。

【注】看看下面的代码：

void *memchr(void *pv, unsigned char ch, size_t s i z e )
2

{
3

unsigned char *pch = (unsigned char *)pv;
4

unsigned char *pchEnd = pch + size;
5

while (pch < pchEnd && *pch != ch)
6

pch++;
7

return ((pch < pchEnd) ? pch : NULL);
8

}

其中对特殊情况有两处处理代码，这不利于维护。可以将同样特殊处理的代码合并在一处：

void *memchr(void *pv, unsigned char ch, size_t size )
2

{
3

unsigned char *pch = (unsigned char *)pv;
4

unsigned char *pchEnd = pch + size;
5

while(pch < pchEnd)
6

{
7

if (*pch == ch)
8

return (pch);
9

pch++;
10

}
11

return(NULL);
12

}

6.7.Avoid risky language idioms.
避免带风险的语言惯用法。

【注】再看看上面memchr的代码，它们都存在一个难以发现bug。这个地方：

pchEnd = pch + size;
2

while (pch < pchEnd)
3

如果要搜寻的地址范围正好是可寻址内存的最后一段，比如，pv指向可寻址内存范围的最后72 bytes，size＝72，那么会发生什么？死循环。为了避免pchEnd指向一不存在的内存，一个可能的改进如下：

pchEnd = pch + size - 1;
2

while (pch <= pchEnd)
3

这样保证了pchEnd指向最后一个char的位置，这个位置肯定是存在的。
但是，这里又有一个新问题，前面也出过的overflow问题。（UCHAR-MAX overflow bug we saw earlier in BuildToLowerTable）当有指针和计数器时，安全的覆盖一个范围的方法是使用计数器来控制逻辑：

void *memchr(void *pv, unsigned char ch, size_t size )
2

{
3

unsigned char *pch = (unsigned char *)pv;
4

while (size-- > 0 )
5

{
6

if (*pch == ch)
7

return (pch) ;
8

pch ++;
9

}
10

return (NULL);
11

}

有人可能推荐你说使用 --size >= 0 代替 size-- > 0
理由是可以产生更高效的代码。问题是这样做是有可能会产生bug的，比如当size是unsigned值的时候，这个循环会永远执行下去。即使不是unsigned值，当size的初始值是INT_MIN的时候，--size会发生underflow。

另一个有风险的惯用法属于“未有效利用的提高效率方法”是使用位操作进行乘法除法以及2的次方数的mod操作。比如说在第二章中提到的快速版的memset函数中的代码：

pb = (byte *)longfill((long *)pb, 1 , size / 4);
2

size = size % 4;

有人可能会说怎么不优化成如下这样？

pb = (byte *)longfill((long *)pb, 1 , size >> 2);
2

size = size & 3;

问题是>>2等价于/4只在目标操作数是unsigned的value才成立。
C中还有很多带风险的习惯用法，最好的方式是你从你的bug中学习，然后并牢记那些带风险的习惯用法，从而去避免它们。

6.8.Don't needlessly mix operator types. If you must mix operators, use parentheses to
isolate the operations.
在不必要的时候不要混合使用操作符。如果必须这么做的话，使用括号来分隔这些符号。

posted on 2008-01-19 14:50 SoRoMan 阅读(1104) 评论(0) 编辑收藏举报

刷新页面返回顶部

SoRoMan

公告