为什么ps中CPU占用率会有超出%100的现象？

前面的关于ps中的%CPU的含义一文已经介绍了CPU占用率的含义，那么为什么有时会在ps的输出中看到CPU占用率超出%100的现象呢？我们知道在/proc目录下每个进程都会有一个以它的PID以名字的目录，这个目录中有一个stat文件，它包含了和这个进程状态相关的各种信息，它的各个数值对应的含义在内核文档的Documentation/filesystems/proc.txt文件中有明确的定义：

Table 1-3: Contents of the stat files (as of 2.6.22-rc3)
..............................................................................
Field          Content
pid           process id
tcomm         filename of the executable
state         state (R is running, S is sleeping, D is sleeping in an
                uninterruptible wait, Z is zombie, T is traced or stopped)
ppid          process id of the parent process
pgrp          pgrp of the process
sid           session id
tty_nr        tty the process uses
tty_pgrp      pgrp of the tty
flags         task flags
min_flt       number of minor faults
cmin_flt      number of minor faults with child's
maj_flt       number of major faults
cmaj_flt      number of major faults with child's
utime         user mode jiffies
stime         kernel mode jiffies
cutime        user mode jiffies with child's
cstime        kernel mode jiffies with child's
priority      priority level
nice          nice level
num_threads   number of threads
start_time    time the process started after system boot
vsize         virtual memory size
rss           resident set memory size
rsslim        current limit in bytes on the rss
start_code    address above which program text can run
end_code      address below which program text can run
start_stack   address of the start of the stack
esp           current value of ESP
eip           current value of EIP
pending       bitmap of pending signals (obsolete)
blocked       bitmap of blocked signals (obsolete)
sigign        bitmap of ignored signals (obsolete)
sigcatch      bitmap of catched signals (obsolete)
wchan         address where process went to sleep
0             (place holder)
0             (place holder)
exit_signal   signal to send to parent thread on exit
task_cpu      which CPU the task is scheduled on
rt_priority   realtime priority
policy        scheduling policy (man sched_setscheduler)
blkio_ticks   time spent waiting for block IO

这其中就包括这个进程的stime和utime，而ps就是查看这个文件来获得进程运行的时间，从而计算出%CPU,那么stat这个文件中的stime和utime是怎样得到的呢？在fs/proc/array.c中定义了下面两个函数

int proc_tgid_stat(struct task_struct *task, char *buffer)
{
return do_task_stat(task, buffer, 1);
}

int proc_tid_stat(struct task_struct *task, char *buffer)
{
return do_task_stat(task, buffer, 0);
}

在每次读取进程状态信息时，proc文件系统就是调用这两个函数来填充数据的，它们的区别只有调用do_task_stat时传递的最后一个参数不同，看一下do_task_stat的代码就知道这个参数的含义了：
static int do_task_stat(struct task_struct *task, char *buffer, int whole)
｛
       ...
/* add up live thread stats at the group level */
        if (whole) {
            struct task_struct *t = task;
            do {
                min_flt += t->min_flt;
                maj_flt += t->maj_flt;
                utime = cputime_add(utime, task_utime(t));
                stime = cputime_add(stime, task_stime(t));
                t = next_thread(t);
            } while (t != task);

            min_flt += sig->min_flt;
            maj_flt += sig->maj_flt;
            utime = cputime_add(utime, sig->utime);
            stime = cputime_add(stime, sig->stime);
        }
...
}

如果whole的值为1, 那么proc文件系统会把这个进程中各个线程的运行时间累加起来，其中next_thread这个函数就是获取这个进程中的下一个线程。在fork的时候，如果指定了CLONE_THREAD标志，也就是新创建的线程和它的父进程在同一个线程组，那么fork会它加入到这个线程中：
if (clone_flags & CLONE_THREAD) {
        p->group_leader = current->group_leader;
        list_add_tail_rcu(&p->thread_group, &p->group_leader->thread_group);
而next_thread就是没着它的thread_group所在的链表进行遍历，获取线程组中的每个线程。这样就可以解释为什么%CPU字段有超过100%了，因为分子是这个进程（线程组）中所有线程运行的时间，而在同一时刻，同一线程组中的两个不同线程可能在两个不同的CPU上运行，这样总的运行时间就有可能超过物理上真正过去的时间（分母）可见，这种情况只会在SMP的系统上发生。

    执行ps aux时是按进程输出的，但是如果这个进程中还有其他线程，它的stat字段有一个l, 比如firefox
[root@localhost 3013]# ps aux|grep firefox-bin
root      3091 15.6 26.6 374644 137048 ?       Sl   10:05 47:49 /usr/lib/firefox-2.0.0.12/firefox-bin -UILocale zh-CN
[root@localhost 3013]# ps aux -L|grep firefox-bin
root      3091 3091 11.3   12 26.6 374644 137056 ?     Sl   10:05 34:40 /usr/lib/firefox-2.0.0.12/firefox-bin -UILocale zh-CN
root      3091 3130 0.0   12 26.6 374644 137056 ?       Sl   10:05   0:01 /usr/lib/firefox-2.0.0.12/firefox-bin -UILocale zh-CN
root      3091 3131 0.1   12 26.6 374644 137056 ?       Sl   10:05   0:25 /usr/lib/firefox-2.0.0.12/firefox-bin -UILocale zh-CN
root      3091 3140 0.0   12 26.6 374644 137056 ?       Sl   10:05   0:00 /usr/lib/firefox-2.0.0.12/firefox-bin -UILocale zh-CN
root      3091 3141 0.0   12 26.6 374644 137056 ?       Sl   10:05   0:00 /usr/lib/firefox-2.0.0.12/firefox-bin -UILocale zh-CN
...
上面的L参数面显示其他的线程及其TID，进程号和线程号相同的线程就是它的第一个线程,即3091，进入这个目录可以看到：
[root@localhost proc]# cd 3091
[root@localhost 3091]# ls
attr    clear_refs       cpuset   exe     io        maps    mountstats root       smaps status
auxv    cmdline          cwd      fd      limits    mem     oom_adj     sched      stat   task
cgroup coredump_filter environ fdinfo loginuid mounts oom_score   schedstat statm wchan
[root@localhost 3091]# cd task/
[root@localhost task]# ls
11850 11851 11853 11854 11855 3091 3130 3131 3140 3141 3142 3155 3158
[root@localhost task]# cd 3130
[root@localhost 3130]# ls
attr    clear_refs cwd      fd      loginuid mounts     root       smaps status
auxv    cmdline     environ fdinfo maps      oom_adj    sched      stat   wchan
cgroup cpuset      exe      limits mem       oom_score schedstat statm
在一个进程的目录中的task目录下会包含其他的线程的信息。实际上, 在内核中进程和线程并没有什么本质的区别，只不过如果fork的时候共享地址空间那就是线程，否则就是进程。

Email:wudx05@gmail.com

Blog:http://blog.chinaunix.net/u/22326/

阅读(1662) | 评论(2) | 转发(0) |

上一篇：电信系统架构方案

下一篇：grep的常用命令语法

岚天逸见

为什么ps中CPU占用率会有超出%100的现象？

导航

公告