《编程珠玑》笔记9 代码调优

第9章主要对已经写好的代码进行局部优化，这里已经不再涉及算法实现方面。

1.首先是对一个C程序进行性能监视，来找到我们是在哪个函数或哪个部分花费了较多时间。

　　书上说的进行监视的代码实际上是“第13章，使用箱结构进行搜索”的一个实现。具体代码如下：

 1 /* Copyright (C) 1999 Lucent Technologies */
 2 /* From 'Programming Pearls' by Jon Bentley */
 3 
 4 /* genbins.c -- generate random numbers with bins */
 5 //这里去掉了bigrand()函数，用rand()函数直接代替，因为在其bigrand()的实现中使用了 RAND_MAX*rand()，这超出了int型的范围，会出现段错误。
 6 
 7 /* If NODESIZE is 8, this program uses the special-case malloc.
 8    Change NODESIZE to 0 to use the system malloc.
 9  */
10 
11 #include <stdio.h>
12 #include <stdlib.h>
13 
14 #define NODESIZE 8
15 #define NODEGROUP 1000
16 int nodesleft = 0;
17 char *freenode;
18 
19 void *pmalloc(int size)
20 {    void *p;
21     if (size != NODESIZE)
22         return malloc(size);
23     if (nodesleft == 0) {
24         freenode = malloc(NODEGROUP*NODESIZE);
25         nodesleft = NODEGROUP;
26     }
27     nodesleft--;
28     p = (void *) freenode;
29     freenode += NODESIZE;
30     return p;
31 }
32 
33 struct node {
34     int val;
35     struct node *next;
36 };
37 
38 struct node **bin, *sentinel;
39 int bins, bincnt, maxval;
40 
41 void initbins(int maxelms, int pmaxval)
42 {    int i;
43     bins = maxelms;
44     maxval = pmaxval;
45     bin = pmalloc(bins*sizeof(struct node *));
46     sentinel = pmalloc(sizeof(struct node));
47     sentinel->val = maxval;
48     for (i = 0; i < bins; i++)
49         bin[i] = sentinel;
50     bincnt = 0;
51 }
52 
53 struct node *rinsert(struct node *p, int t)
54 {    if (p->val < t) {
55         p->next = rinsert(p->next, t);
56     } else if (p->val > t) {
57         struct node *q = pmalloc(sizeof(struct node));
58         q->val = t;
59         q->next = p;
60         p = q;
61         bincnt++;
62     }
63     return p;
64 }
65 
66 void insert(int t)
67 {    int i;
68     i = t / (1 + maxval/bins);
69     bin[i] = rinsert(bin[i], t);
70 }
71 
72 void report()
73 {    int i, j = 0;
74     struct node *p;
75     for (i = 0; i < bins; i++)
76         for (p = bin[i]; p != sentinel; p = p->next)
77                 ;
78         //     printf("%d\n", p->val) ;
79             /* Uncomment for testing, comment for profiling */
80 }
81 
82 int main(int argc, char *argv[])
83 {    int m = atoi(argv[1]);
84     int n = atoi(argv[2]);
85     initbins(m, n);
86     while (bincnt < m) {
87         insert(rand() % n);
88     }
89     report();
90     return 0;
91 }

Linux下对程序性能的监视采用gprof命令。首先在编译时加上-gp选项，然后运行一遍，最后使用gprof ./genbinTest查看性能。关于gprof的用法查看这里：http://blog.sina.com.cn/s/blog_6608391701013phr.html

(此时NODESIZE设为0）

kqiao@ubuntu:~/MyCodes/ProgramPearls/CH9$ gcc genbins.c -o genbinTest -pg
kqiao@ubuntu:~/MyCodes/ProgramPearls/CH9$ ./genbinTest 20000000 30000000
kqiao@ubuntu:~/MyCodes/ProgramPearls/CH9$ gprof ./genbinTest
Flat profile:

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 52.37      3.97     3.97 32956533     0.00     0.00  insert
 31.00      6.32     2.35 32956533     0.00     0.00  rinsert
 10.03      7.08     0.76        1   760.00   760.00  report
  4.09      7.39     0.31                             main
  1.45      7.50     0.11        1   110.00   110.00  initbins
  1.06      7.58     0.08 20000002     0.00     0.00  pmalloc

　　查看上面的源代码，其中pmalloc函数是自己实现的对于malloc函数的封装。NODESIZE设为0，那么每次都会调用malloc函数，跟普通情况相同，将NODESIZE设为8，就是sizeof(struct node)的大小，主要是避免每次在rinsert时都重新调用malloc函数。程序事先分配好NODEGROUP大小的空间，每次rinsert从该空间中取用，当剩余nodeleft为0时，才会再次调用malloc申请NODEGROUP的空间。

　　但是有一点，按照书上说的，正常的malloc比pmalloc慢，实际上在我将NODESIZE设置为0时，跟NODESIZE为8时没有太大区别（应该是硬件太强大。。）。下面是NODESIZE设为8的情形：

Each sample counts as 0.01 seconds.
  %   cumulative   self              self     total           
 time   seconds   seconds    calls  ms/call  ms/call  name    
 47.83      3.75     3.75 32956533     0.00     0.00  insert
 35.22      6.52     2.77 32956533     0.00     0.00  rinsert
 10.19      7.32     0.80        1   800.00   800.00  report
  4.71      7.69     0.37                             main
  1.27      7.79     0.10        1   100.00   100.00  initbins
  0.76      7.85     0.06 20000002     0.00     0.00  pmalloc

2.对小函数的优化方案

　　2.1——整数取模

　　因为模运算符相对与其他算术运算符开销较大，对与 k = m %n; 可以使用如下语句代替：

　　k = m;

　　while (k >= n)

　　　　k -= n;

　　特别是在我们确定m只比n大一倍，while可以换成if语句。

　　2.2——函数、宏和内联

　　内联是在C++中才有的特性，也是对宏的一种取代，并且具有错误检查功能。后两者一般快于函数。（但要注意有时可能不正确，见习题4）

　　2.3——顺序搜索

　　使用哨兵减少测试条件；将循环展开，一次测试多个位都可以加快速度。

　　以上代码调优都是减少CPU时间，也可以减少分页或增加告诉缓存命中率。

3.习题

　　3.1 第一部分gprof监视程序性能

　　3.2 第一部分程序中pmalloc函数优化

　　3.3 这个在2.1部分有说明：在我们确定m只比n大一倍，while可以换成if语句。

　　3.4 宏是直接在代码中展开的。参看max(a,b)的宏定义就能看到 a>b ? a: b.　　有两个地方出现了b，就是说同样的函数被递归调用了两次，后面会出现越来越多的重复计算。所以时间急剧增加。

　　3.5 对未排序的数组，如果bsearch找到了一个位置，那该值一定是存在的；如果没有找到，该值也可能存在。

　　3.6 确定数字 isdigit的实现： c >= '0' && c <= '9'

　　　　确定大写字母isupper的实现： c >= 'A' && c<= 'Z'

　　　　确定小写字母islower的实现：c >= 'a' && c <= 'z'

　　　　大多数系统实现中，使用一个预先存储好的表来进行判断，使用位与&运算。

　　3.7 统计一个很长的字节序列中1的个数？

　　　　这道题与“《编程之美》2.1节求二进制数中1的个数”有部分相同。但是此处我们的输入有多种可能，可能是一个8bit的字符型，可能是32bit的整型。

　　　　如果对于一个数而言，

　　　　（1）while(i) { if(i % 2 == 1) count ++; i/=2;}

　　　　（2）while(i) { count += i & 0x01; i >>= 1;}　　这两种方法的时间复杂度都是O(logn)

　　　　（3）while(i) { i &= (i-1); count++; } 第三种方法只考虑那些为1的位置，降低了时间复杂度。以10100001为例，我们要做的就是在每次判断中只对为1的位进行判断。具体方法是：从低位到高位，对有1的位逐位进行 i&(i-1)。这样一次循环中，将有1的位置为0，同时count自增。

　　　　（4）采用空间换时间的方法，对于一个只有8bit的数总共有 2^8 = 256 个数值，每个数值中1的数目是1～8中的一个。直接使用一个countTable[256]的数组将这些情况罗列出来，然后返回数组值即可。

　　　　对于目前所有的字节序列，可以分别统计每一个输入单元中1的位数，然后相加。也可以使用第三种方法逐为进行观察。

　　3.8 如何在程序中使用哨兵找出数组的最大元素？

　　　　联想插入排序等，哨兵就是那个最大的元素，现在的问题是，最大的元素是要实时更新的。所以采用如下代码：在需要时更新哨兵，不需要时直接自增

int i = 0; 
while( i < n)
{
   max = x[i];
   x[n] = max;
   i++;
   while(x[i] < max)
      i++;
}

　　3.10 散列法

　　　　散列最简单的就是直接调用 STL中的set模板（一般搜索问题也正是这样做的）其他的一些数据结构的实现，参见“第13章搜索”。

　　　　一般实现的散列是采用开放定址法，就是通过确定表大小，取模来决定位置。

　　3.12 对多项式的计算：采用从高位向地位计算可以减少n次乘法

　　　y = a[n];

　　　for(int i = n-1; i >= 0; i--)

　　　　y = x*y + a[i];

posted @ 2012-09-07 13:30 dandingyy 阅读(783) 评论(0) 编辑收藏举报

刷新页面返回顶部

dandingyy

window.onload = function() { dp.SyntaxHighlighter.ClipboardSwf = 'https://files.cnblogs.com/dandingyy/clipboard.swf'; dp.SyntaxHighlighter.HighlightAll('code'); };

《编程珠玑》笔记9 代码调优

公告