核中汇编写的字符串函数代码分析

*************************************************************** 
开始啃用汇编写的字符串函数: 
*************************************************************** 
--------------------------------------------------------------- 
_I386_STRING_H_宏 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#ifndef _I386_STRING_H_ 
#define _I386_STRING_H_ 
当包括了该汇编写的字符串处理函数的头文件后,就定义这个宏予以说明。 
--------------------------------------------------------------- 
__KERNEL__宏 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#ifdef __KERNEL__ 
#include <linux/config.h> 
注意: 
只有定义的了__KERNEL__宏才会包含config.h头文件。 
/* 
* On a 486 or Pentium, we are better off not using the 
* byte string operations. But on a 386 or a PPro the 
* byte string ops are faster than doing it by hand 
* (MUCH faster on a Pentium). 
*/ 
下面这段注释很重要,建议看看: 
/* 
* This string-include defines all string functions as inline 
* functions. Use gcc. It also assumes ds=es=data space, this *should be normal. Most of the string-functions are rather *heavily hand-optimized, 
* see especially strsep,strstr,str[c]spn. They should work, but are not 
* very easy to understand. Everything is done entirely within the register 
* set, making the functions fast and clean. String instructions have been 
* used through-out, making for "slightly" unclear code :-) 

* NO Copyright (C) 1991, 1992 Linus Torvalds, 
* consider these trivial functions to be PD. 
*/ 

/* AK: in fact I bet it would be better to move this stuff all out of line. */ 
--------------------------------------------------------------- 
__HAVE_ARCH_STRCPY strcpy() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_STRCPY 
static inline char * strcpy(char * dest,const char *src) 

int d0, d1, d2; 
__asm__ __volatile__( 
"1:\tlodsb\n\t" 
"stosb\n\t" 
"testb %%al,%%al\n\t" 
"jne 1b" 
: "=&S" (d0), "=&D" (d1), "=&a" (d2) 
:"0" (src),"1" (dest) 
: "memory"); 
return dest; 


分析: 
1.改写指令更清楚点: 
1: ---> 1: 
lodsb ---> mov al,ds:[si] 
inc si 
stosb ---> mov es:[di],al 
inc di 
testb al,al ---> test al,al 
jne 1 ---> jne 1 
明显该循环以0结束,当读到最后一个为0的字节后,该循环终止。 

2.参数分析: 
S: si/esi 
&: 一般情况下,gcc会把输入操作数和输出操作数分配在同一个寄存器中,因为它假设在输出产生之前所有的输入都被消耗掉了。在输出操作数之前加上"&",可以保证输出操作数不会覆盖掉输入,即gcc将为此输出操作数分配一个输入操作数还没使用的寄存器,除非特殊声明(如用数字0-9,见下面) 

0-9: 指定一个操作数,它既作输入,又作输出,而且输入操作数和输出操作数占据同一个位置(寄存器)。数字标志只能出现在输入中,指出与第I个输出操作数占据同一个位置。 

int d0, d1, d2; 
"=&S" (d0), "=&D" (d1), "=&a" (d2) 
"0" (src),"1" (dest) 
代码分析: 
该输入操作数src和dst是既用作为输入操作数,又用作输出操作数的。在最开始时,src,dest作为整个函数的入口参数。将src,dest这两个char*型指针送入si/esi,di/edi中。在"0"与"1"的作用下,src与d0占据同一个寄存器si/esi,dst与d1占据同一个寄存器di/edi,所以d0,d1将分别从si/esi,di/edi中取出src,dest存入其中的函数入口参数,从而实现了将参数转移到函数局部变量上来。在函数的执行中si/esi,di/edi寄存器发生了变化。最后函数执行完毕返回时。由于src,dest前面指定的"0"和"1"说明了src,dest是既用作为输入操作数,又用作输出操作数的。且又分别与第0,1个输出操作数d0,d1占据同一个寄存器si/esi,di/edi。且又在"&"的保护下,明确指明输出操作数不能覆盖输入操作数,所以src,dest分别存入si/esi,di/edi中作为输出。 

D: di/edi 
a: ax/eax 
"memory": 这是register-modified部分。说明内存修改不可预测,禁止编译器将其值缓存于寄存器中。 

3.指令分析: 
lodsb: == mov al,[si] 
inc si / dec si 
stosb: == mov es:[di],al 
inc di / dec di 
testb: == test oprd1,oprd2 
把oprd1 & oprd2指令执行后,设置标志ZF,PF,SF. 

--------------------------------------------------------------- 
__HAVE_ARCH_STRNCPY strncpy() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_STRNCPY 
static inline char * strncpy(char * dest,const char *src,size_t count) 

int d0, d1, d2, d3; 
__asm__ __volatile__( 
"1:\tdecl %2\n\t" 
"js 2f\n\t" 
"lodsb\n\t" 
"stosb\n\t" 
"testb %%al,%%al\n\t" 
"jne 1b\n\t" 
"rep\n\t" 
"stosb\n" 
"2:" 
: "=&S" (d0), "=&D" (d1), "=&c" (d2), "=&a" (d3) 
:"0" (src),"1" (dest),"2" (count) 
: "memory"); 
return dest; 


指令重排: 
1: decl ecx ===> 1: dec cx 
js 2 ===> js 2 
lodsb ===> mov al,ds:[si] 
inc si / dec si 
stosb ===> mov es:[di],al 
inc di /dec si 
testb al,al ===> test al,al 
jne 1 ===> jne 1 

rep ===> rep 
stosb ===> mov es:[di],al 
inc di /dec si 
2: ===> 2: 

分析: 
对这段代码的分析分3种情况: 
若内存中为: abcde\0, 
1)要求复制3个字符: 
(1)初始值CX == 3 
然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0 
3-->2: copy a 
2-->1: copy b 
1-->0: copy c 
0-->-1 js 2 

2)要求复制5个字符: 
(1)初始值CX == 5 
然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0 
5-->4: copy a 
4-->3: copy b 
3-->2: copy c 
2-->1: copy d 
1-->0: copy e 
0-->-1 js 2 
(2)所以复制5个字符: 复制5个字符:5个字符. 

3)要求复制6个字符: 
(1)初始值CX == 6 
然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0 
6-->5: copy a 
5-->4: copy b 
4-->3: copy c 
3-->2: copy d 
2-->1: copy e 
1-->0: copy \0 
test al,al ===> al == \0 ZF == 1成立. 
jne 1 ===> 不会跳转到1 

继续往下执行:此时CX == 0,al == \0 
rep: 判断CX是否为0,而cx == 0,就结束循环 
(2)所以复制6个字符: 复制6个字符:5个字符+一个'\0'. 

4)要求复制10个字符: 
初始值CX == 10 
然后每次减一,复制一个字符过去;然后再判断复制的该字符是否为0 
10-->9: copy a 
9-->8: copy b 
8-->7: copy c 
7-->6: copy d 
6-->5: copy e 
5-->4: copy \0 
test al,al ===> al == \0 ZF == 1成立. 
jne 1 ===> 不会跳转到1 

继续往下执行:此时CX == 4,al == \0 
rep : CX==4,CX!=0,(CX=CX-1)==3,继续往下执行 
copy al == \0 
重复循环: 
rep : CX==3,CX!=0,(CX=CX-1)==2,继续往下执行 
copy al == \0 
重复循环: 
rep : CX==2,CX!=0,(CX=CX-1)==1,继续往下执行 
copy al == \0 
重复循环: 
rep : CX==1,CX!=0,(CX=CX-1)==0,继续往下执行 
copy al == \0 
重复循环:rep: cx==0,就结束循环 
(2)所以复制10个字符,先复制6个字符:5个字符+一个'\0',再填充4个'\0' 

5)要求复制0个字符: 
(1)初始值CX == 0 
0-->-1 js 2 
(2)所以复制了0个字符。 

6)要求复制-1个字符: 
(1)初始值CX == -1 
-1-->-2 js 2 
(2)所以复制了0个字符。 
注意: 
static inline char * strncpy(char * dest,const char *src,size_t count),该函数中的count是送往cx/ecx中去了,而ecx最大为32位故对有符号数最多复制2G-1个字节,即字符串不能超过(2G-1)B。 
当时产生疑问,当CX<=0时,都是不复制,为何不干脆用个无符号数,这样可以扩大到4G。请看下一个函数就解决了。因为当要把两个字符串串联起来时,也是用ECX作为计数器的,而ECX为32位,最大表示范围为4G-1,所以这两个字符串的长度就各分了一半为2G-1. 
rep指令说明: 
重复其后面的串操作指令动作,每一次重复都先判断CX是否为0,如为0就结束循环,否则CX的值减1。 
类似于loop指令,但loop指令是先把CX的值减1,后再来判断是否为0。 
注意在重复过程中的减一操作,不会影响各标志。 
--------------------------------------------------------------- 
strcat() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_STRCAT 
static inline char * strcat(char * dest,const char * src) 

int d0, d1, d2, d3; 
__asm__ __volatile__( 
"repne\n\t" 
"scasb\n\t" 
"decl %1\n" 
"1:\tlodsb\n\t" 
"stosb\n\t" 
"testb %%al,%%al\n\t" 
"jne 1b" 
: "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3) 
: "0" (src), "1" (dest), "2" (0), "3" (0xffffffffu) 
:"memory"); 
return dest; 

指令重排: 
repne ===> while(ECX != 0 && ZF != 1) 
scasb ===> { 
if((al-es:[edi])==0) 
ZF = 1; 
edi++; 
ECX--; 


decl %1 ===> dec edi 
1: ===> 1: 
lodsb ===> mov al, ds:[esi] 
inc esi 
stosb ===> mov es:[edi], al 
inc edi 
testb %%al,%%al ===> test al, al 
jne 1 ===> jne 1 

参数初始值分析: 
: "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3) 
: "0" (src), "1" (dest), "2" (0), "3" (0xffffffffu) 
src ==> si/esi 此处为: esi 
dest ==> di/edi 此处为: edi 
0 ==> ax/eax 此处为: ax 
0xffffffffu ===> ecx 此处为: ecx 
所以,esi,edi指向两个字符串的起始位置;而ax==0;ecx==0xffffffffu 

一般情况分析: 
初始值: 
esi--->'abc\0' (src) 
edi--->'123\0' (dest) 
al == 0 
ecx == 0xffffffffu 
while(ECX != 0 && ZF != 1) 

if((al-es:[edi])==0) 
ZF = 1; 
edi++; 
ECX--; 

在edi所指向的字符串中一直找到以'\0'结束的地方。然后,edi指向'\0'字节的下一个字节,ECX--;再就循环结束。此时edi=edi+4;ECX=ECX-4。 

说明:可见要么在es:[edi]所指向的字符串中找到为'\0'的字符,从而能结束循环。要么该字符串大于或等于0xffffffff(2G-1B)(不计结尾处的'\0'),使得ECX减为0,从而结束循环。 

dec edi 
edi = edi - 1;edi就指向es:[edi]所指向的字符串中的'\0'结束处字符。 

此时寄存器的值为: 
esi--->'abc\0' (src) 
edi--->'123\0'中的为'\0'结尾处字符 (dest) 
al == 0 
ecx == 0xffffffffbu 

1: 
mov al, ds:[esi] 
inc esi 
mov es:[edi], al 
inc edi 
test al, al 
jne 1 
将ds:[esi]所指向的字符串复制到es:[edi]所指向的字符串的结尾处,从es:[edi]所指向字符串的'\0'处开始。该'\0'被覆盖。 

esi--->'abc\0?'中的'?'处. (src) 
edi--->'123abc\0?'中的最后为'?'结尾处字符 (dest) 
al == 0 ,注意这个0是从esi所指向的字符串中取出的结尾字符,而非初始化的0 

功能:strcat(char * dest,const char * src),将src所指向的字符串复制到dest所指向的字符串的后面,将dest的'\0'覆盖,dest-src串成一个字符串后,再将src的'\0'复制过来使dest-src串结的字符串结束。 

算法过程: 
1.先扫描dest所指向的字符串,找到其的为'\0'处; 
2.再从src所指向的字符串中一一将src所指向的字符串的各个字节复制到dest以'\0'为起始处。一直复制到src所指向的字符串的最后一个'\0',将这个'\0'复制完后。就结束程序。 
可见,该函数要求src,dest所向的字符串要以'\0'结束。 

特殊情况1: 
初始值: 
esi--->'abc\0' (src) 
edi--->'123456789... ...YX' 该字符串>=0xffffffff (dest) 
设edi指向es这个段的开始处,为0基址。 
即:edi[0]=='1',edi[0xffffffff]=='X',由于edi只有32位,表示范围为0X0--->0xffffffff,共4G个字符。所以就算该字符串有多于4G的字符,esi将无法引用,所以该edi所指向的字符串到edi[0xffffffff]=='X'止。字符再多,edi再++,edi又变为了0。 
esi的分析也同此。 
al == 0 
ecx == 0xffffffffu 
while(ECX != 0 && ZF != 1) 

if((al-es:[edi])==0) 
ZF = 1; 
edi++; 
ECX--; 

循环体执行0xffffffff次 
由于edi所指向的字符串>=0xffffffff,则在上面的寻找edi所指向的字符串的'\0'结束符时候,就会使ECX == 0,从而结束循环,此时edi指向(0xffffffff)处的字节。(不考虑段越界) 
出循环时,ECX == 0,edi == 0xffffffff。 

dec edi 
edi = edi - 1;edi == 0xffffffff-1,即:edi[0xffffffff-1]=='Y'。 

此时寄存器的值为: 
esi--->'abc\0' (src) 
edi--->'123456......YX',edi==0xffffffff-1,edi就指向edi[0xffffffff-1]=='Y'(即:0xffffffff-1)处的字节 (dest) 
al == 0 
ecx == 0x00000000u 

1: 
mov al, ds:[esi] 
inc esi 
mov es:[edi], al 
inc edi 
test al, al 
jne 1 
将ds:[esi]所指向的字符串'abc\0'中的esi[0]=='a'复制到es:[edi]==es:edi[0xffffffff-1]=='Y'处。该es:[0xffffffff-1]=='Y'的字节'Y'被覆盖为'a'。即:esi[0]=='a'--->edi[0xffffffff-1]=='Y' 
edi--->'123456......aX'。 
这时,esi++,esi[1]=='b';edi++,edi[0xffffffff]=='X'。 

再从ds:[esi]中复制下一个esi[1]=='b',到edi[0xffffffff]=='X' 
edi--->'123.....ab',edi++,edi==0x00000000,就指向edi[0]=='1'处的字节 
esi++,esi[2]=='c'.esi--->'abc\0?'中的'c'处, (src) 

再从esi[2]=='c',复制到edi[0x00000000]=='1'处。 
esi++,esi[3]=='\0',esi--->'abc\0?'中的'\0'处. (src) 
edi--->'c23.....ab',edi++,edi==0x00000001,就指向edi[0x00000001]=='2'处的字节 

再从esi[3]=='\0',复制到edi[0x00000001]=='2'处。 
esi++,esi[4]=='?',esi--->'abc\0?'中的'?'处. (src) 
edi--->'c\03.....ab',edi++,edi==0x00000002,就指向edi[0x00000002]=='3'处的字节。 

所以合并后的字符串为"c\0". 

与此类似,当src中的字符等于4G时,情况同上;而当src,dest均等于4G时,情况也同上。 
只要src,dest中的字符之和不大于4G-1,留一个给'\0',就OK! 

当src,dest中有一个或多个为空时,情况简单: 
当dest为空,而src不为空:将src所指向的字符串连同'\0'复制到dest中去! 
当src为空,而dest不为空:dest不动,只将src所指的'\0',复制并覆盖dest中的最后一个'\0'! 
当src为空,而dest为空:只将src所指的'\0',复制并覆盖dest中那个'\0'! 

参考资料: 
S:si/esi 
D:di/edi 
a:ax/eax 
c:cx/ecx 
&: 一般情况下,gcc会把输入操作数和输出操作数分配在同一个寄存器中,因为它假设在输出产生之前所有的输入都被消耗掉了。在输出操作数之前加上"&",可以保证输出操作数不会覆盖掉输入,即gcc将为此输出操作数分配一个输入操作数还没使用的寄存器,除非特殊声明(如用数字0-9,见下面) 

0-9: 指定一个操作数,它既作输入,又作输出,而且输入操作数和输出操作数占据同一个位置(寄存器)。数字标志只能出现在输入中,指出与第I个输出操作数占据同一个位置。 
--------------------------------------------------------------- 
strncat() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_STRNCAT 
static inline char * strncat(char * dest,const char * src,size_t count) 

int d0, d1, d2, d3; 
__asm__ __volatile__( 
"repne\n\t" 
"scasb\n\t" 
"decl %1\n\t" 
"movl %8,%3\n" 
"1:\tdecl %3\n\t" 
"js 2f\n\t" 
"lodsb\n\t" 
"stosb\n\t" 
"testb %%al,%%al\n\t" 
"jne 1b\n" 
"2:\txorl %2,%2\n\t" 
"stosb" 
: "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3) 
: "0" (src),"1" (dest),"2" (0),"3" (0xffffffffu), "g" (count) 
: "memory"); 
return dest; 

指令重排: 
repne ===> while(ecx != 0 && ZF != 1) 
scasb ===> { 
if((al-es:[edi])==0) 
ZF = 1; 
edi++; 
ecx--; 

decl %1 ===> decl edi 
movl %8,%3 ===> movl count,ecx 
1: ===> 1:
decl %3 ===> decl ecx
js 2 ===> js 2
lodsb ===> mov al,ds:[esi]
inc esi 
stosb ===> mov es:[edi],al 
inc edi 
testb %%al,%%al ===> test al,al
jne 1 ===> jne 1
2: ===> 2:
xorl %2,%2 ===> xor eax,eax
stosb ===> mov es:[edi],al 
inc edi
参数初始值分析: 
: "=&S" (d0), "=&D" (d1), "=&a" (d2), "=&c" (d3) 
: "0" (src),"1" (dest),"2" (0),"3" (0xffffffffu), "g" (count) 
esi: esi = src 
edi: edi = dest 
eax: eax = 0 
ecx: ecx = 0xffffffff 
"g": 让编译器决定如何装入它。 

代码分析: 
while(ecx != 0 && ZF != 1) 

if((al-es:[edi])==0) 
ZF = 1; 
edi++; 
ecx--; 

decl edi 
在es:[edi]所指向的字符串中寻找'\0'处。然后回调edi指向该'\0'。 
当该字符串在4G-1个字节中时,以'\0'正常结束。而当该字符串等于4G时,以ecx==0结束循环,edi回调后指向edi[0xffffffff-1]处。而字符串大于4G则不可能。 

movl count,ecx 
1:
decl ecx
js 2
mov al,ds:[esi]
inc esi 
mov es:[edi],al 
inc edi 
test al,al
jne 1
2:
xor eax,eax
mov es:[edi],al 
inc edi

1:表示开始复制esi所指向的字符串到edi中去。 
2:表示复制结束后,在未尾再加个'\0'。 
分情况讨论: 
1)若count数大于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串连同'\0'复制过了后,结束1:循环,在2:中再在'\0'的后面再复制一个'\0',再edi++,结束程序。 

2)若count数小于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串中只复制count个后,ecx将减为-1后,由js 2跳出1:,在2:中接着再在后面复制一个'\0',再edi++,结束程序。 

3)若count等于ds:[esi]所指向的字符串中的字符个数。则esi所指向的字符串中复制count个后,ecx将减为0后,再在开始处ecx--,ecx == -1, 由js 2跳出1:,在2:中接着再在后面复制一个'\0',再edi++,结束程序。 

4)若count为负数,在开始处ecx--,ecx == 负数, 由js 2跳出1:,在2:中接着再在后面复制一个'\0',即给edi所指向的字符串的那个'\0'再用'\0'重写一遍'\0',再edi++,结束程序。 

尽管可以复制4G个字节,由于count为有符号数,则最多复制2G-2(除掉'\0')个字节。这显然是假设es:[edi]这个字符串最大为2G而来的,因为作者也不知道es:[edi]所指向的字符串有多长,虽然大部分不可能有2G,只有点点大,但作者却是作了最一般化的处理。 
--------------------------------------------------------------- 
__HAVE_ARCH_STRCMP strcmp() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_STRCMP 
static inline int strcmp(const char * cs,const char * ct) 

int d0, d1; 
register int __res; 
__asm__ __volatile__( 
"1:\tlodsb\n\t" 
"scasb\n\t" 
"jne 2f\n\t" 
"testb %%al,%%al\n\t" 
"jne 1b\n\t" 
"xorl %%eax,%%eax\n\t" 
"jmp 3f\n" 
"2:\tsbbl %%eax,%%eax\n\t" 
"orb $1,%%al\n" 
"3:" 
:"=a" (__res), "=&S" (d0), "=&D" (d1) 
:"1" (cs),"2" (ct) 
:"memory"); 
return __res; 


初始值分析: 
ax/eax:register int __res; 
si/esi:const char* cs; 
di/edi:const char* ct; 
ZF == 0 

指令重排: 
1: lodsb ===> 1: mov al,ds:[esi]
inc esi 
scasb ===> if((al-es:[edi])==0) 
ZF = 1; 
edi++; 
jne 2 ===> jne 2; 
testb %%al,%%al ===> testb al,al 
jne 1 ===> jne 1 
xorl %%eax,%%eax ===> xorl eax,eax 
jmp 3 ===> jmp 3 
2: sbbl %%eax,%%eax ===> 2: sbbl eax,eax 
orb $1,%%al ===> orb al ,1 
3: ===> 3: 

1)代码剖析: 
这是比较ds:[esi]和es:[edi]两个字符串是否相等。这两个字符串当以'\0'结束。函数返回值存放在eax中。将ds:[esi]中的每个字符送往al中,再与es:[edi]中的相应的各个字符进行比较,相同就置位ZF=1,然后测试al该字符是否为'\0',如果不是则继续比较下一个字符;如果是'\0',则就清eax为0,结束比较函数,该eax就为函数的返回值。 

2)情况: 
1.ds:[esi]和es:[edi]两个字符串是相等:同上,eax返回0 
2.ds:[esi]和es:[edi]两个字符串不相等: 
(1)ds:[esi]的字符串ASCII小于es:[edi]的ASCII 
ds:[esi]=="abc\0" 
es:[edi]=="xyz\0" 
if((al-es:[edi])==0) ===>if( ('a'-'x')==0 ) 
ZF = 1; 条件不成立; CF == 1 
edi++; edi++; edi指向'y' 
jne 2 ; jne 2 
2: sbbl eax,eax eax = eax-eax-CF=-1=0xffffffff 
orb al ,1 al = 0xff 

结论: 
cs所指的字符串中第一个不同的字符的ASCII<ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0xffffffff==-1 

(2)ds:[esi]的字符串ASCII大于es:[edi]的ASCII 
ds:[esi]=="xyz\0" 
es:[edi]=="abc\0" 
if((al-es:[edi])==0) ===>if( ('x'-'a')==0 ) 
ZF = 1; 条件不成立; CF == 0 
edi++; edi++; edi指向'y' 
jne 2 ; jne 2 
2: sbbl eax,eax eax = eax-eax-CF=0 
orb al ,1 al = 0|1=1=0x00000001 
输出: eax==0x00000001 
结论: 
cs所指的字符串第一个不同的字符的ASCII>ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0x00000001==1 

(3)当其中一个字符串是另一个字符串的子字符串时: 
ds:[esi]=="abc\0" 
es:[edi]=="abc123\0" 
当比较到'\0'-'1'时,结束循环,返回-1. 
而是这种情况时候 : 
ds:[esi]=="abc123\0" 
es:[edi]=="abc\0" 
当比较到'1'-'\0'时,结束循环,返回1. 

(4)若其中一个为无限长的字符串,另一个为有限长的字符串时: 
则要么在其中的一个位置不同,跳出来同上面的分析;要么一个相当于为另一个的子字符串,分析同上。 
可见,只要一个字符串符合以'\0'结束的规则,另一个字符串就算没有'\0'结束,也能正常终止函数。 

(5)两个字符串均为无限长的字符串: 
若两者在中间某处不等,就终止跳出,分析同上。 
若两者完全相等且又无限长,则就地直比较下去。esi,edi将递增到0xffffffff,然后又回到0x00000000。若两字符串是从0x00000000开始的话,就又重复比较下去,一个死循环。若两字符串是从中间某处开始,这个内存中的0x00000000开始处或其后面有不同的字符,就会终止函数。 
--------------------------------------------------------------- 
__HAVE_ARCH_STRNCMP strncmp() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_STRNCMP 
static inline int strncmp(const char * cs,const char * ct,size_t count) 

register int __res; 
int d0, d1, d2; 
__asm__ __volatile__( 
"1:\tdecl %3\n\t" 
"js 2f\n\t" 
"lodsb\n\t" 
"scasb\n\t" 
"jne 3f\n\t" 
"testb %%al,%%al\n\t" 
"jne 1b\n" 
"2:\txorl %%eax,%%eax\n\t" 
"jmp 4f\n" 
"3:\tsbbl %%eax,%%eax\n\t" 
"orb $1,%%al\n" 
"4:" 
:"=a" (__res), "=&S" (d0), "=&D" (d1), "=&c" (d2) 
:"1" (cs),"2" (ct),"3" (count) 
:"memory"); 
return __res; 

初始值: 
ax/eax:__res 
si/esi:const char * cs 
di/edi:const char * ct 
cx/ecx:count 

指令重排: 
1: decl %3 ===> 1: decl ecx 
js 2 ===> js 2 
lodsb ===> mov al,ds:[esi]
inc esi 
scasb ===> if((al-es:[edi])==0) 
ZF = 1; 
edi++; 
jne 3 ===> jne 3 
testb %%al,%%al ===> testb al,al 
jne 1 ===> jne 1 
2: xorl %%eax,%%eax ===> 2: xorl eax,eax 
jmp 4 ===> jmp 4 
3: sbbl %%eax,%%eax ===> 3: sbbl eax,eax 
orb $1,%%al ===> orb 1,al 
4: ===> 4: 

此函数分析同上: 
1)当指定的要比较的字符个数小于两个字符串长度时: 
a:两字符串相同:ecx变为-1,由js 2出循环,再由xorl eax,eax将eax清0,作为函数的返回值返回。 
b:两字符串不相同:由jne 3跳出来: 
b-1:当cs所指的字符串第一个不同的字符的ASCII>ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0x00000001==1; 
b-2:当cs所指的字符串中第一个不同的字符的ASCII<ct所指的字符串第一个不同的字符的ASCII 返回值: eax==0xffffffff==-1 

2)当指定的要比较的字符个数count等于两个字符串长度时: 
a:两者相等时: 
由testb al,al跳出循环,再由xorl eax,eax,将eax清0,返回这个0,结束函数。 
b:两者不相等时: 
同上分析。 

3)当指定的要比较的字符个数count大于两个字符串时: 
a:两者相等时: 
比较到'\0'时,由testb al,al跳出循环,再由xorl eax,eax,将eax清0,返回这个0,结束函数。 
b:两者不相等时: 
同上分析。 

4)当指定的要比较的字符个数count<=0时: 
程序流程如下: 
根本就不比较,直接返回0,结束函数。 
1: decl %3 ===> 1: decl ecx 
js 2 ===> js 2 
... ... 
2: xorl %%eax,%%eax ===> 2: xorl eax,eax 
jmp 4 ===> jmp 4 
... ... 
4: ===> 4: 

--------------------------------------------------------------- 
__HAVE_ARCH_STRCHR strchr() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_STRCHR 
static inline char * strchr(const char * s, int c) 

int d0; 
register char * __res; 
__asm__ __volatile__( 
"movb %%al,%%ah\n" 
"1:\tlodsb\n\t" 
"cmpb %%ah,%%al\n\t" 
"je 2f\n\t" 
"testb %%al,%%al\n\t" 
"jne 1b\n\t" 
"movl $1,%1\n" 
"2:\tmovl %1,%0\n\t" 
"decl %0" 
:"=a" (__res), "=&S" (d0) 
:"1" (s),"0" (c) 
:"memory"); 
return __res; 


初始值: 
ax/eax:int c 
si/esi:const char *s 

指令重排: 
movb %%al,%%ah ===> movl al,ah 
1: lodsb ===> 1: mov al,ds:[esi]
inc esi 
cmpb %%ah,%%al ===> cmpb ah,al
je 2 ===> je 2 
testb %%al,%%al ===> testb al,al 
jne 1 ===> jne 1 
movl $1,%1 ===> movl 1,esi 
2: movl %1,%0 ===> 2: movl esi,eax 
decl %0 ===> decl eax 

功能: 
ds:[esi]所指向的字符串以'\0'结束,在其中从前往后寻找c字符。如果找到,就返回该字符所在字符串中的位置。如果没找到,就返回0。 

改写成C语言: 
al == 要找寻的字符c; 
esi == 该字符串的起始偏移地址; 
int eax; 
char ah; 
ah = al; 
1: 
al = *(ds*16 + esi); 
esi++; 
if( al == ah ) 
goto 2; 
if( al != 0 ) 
goto 1; 
esi = 1; 
2: 
eax = esi; 
eax--; 
return eax; 

极端情况: 
如果ds:[esi]所指向的字符串不以'\0'结束的话,esi一个劲的++,直到变到0xffffffff,然后又变为0x00000000,又从头开始寻找,如果开头及到ds:[esi]处都找不到该字符c,或是也没有'\0'时,就陷入一个死循环。 
--------------------------------------------------------------- 
__HAVE_ARCH_STRRCHR strrchr() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_STRRCHR 
static inline char * strrchr(const char * s, int c) 

int d0, d1; 
register char * __res; 
__asm__ __volatile__( 
"movb %%al,%%ah\n" 
"1:\tlodsb\n\t" 
"cmpb %%ah,%%al\n\t" 
"jne 2f\n\t" 
"leal -1(%%esi),%0\n" 
"2:\ttestb %%al,%%al\n\t" 
"jne 1b" 
:"=g" (__res), "=&S" (d0), "=&a" (d1) 
:"0" (0),"1" (s),"2" (c) 
:"memory"); 
return __res; 



初始值分析: 
__res : 0 
si/esi : const char * s 
ax/eax : c 

指令重排: 
movb %%al,%%ah ===> movb al,ah 
1: lodsb ===> 1: mov al,ds:[esi]
inc esi 
cmpb %%ah,%%al ===> cmpb ah,al 
jne 2 ===> jne 2 
leal -1(%%esi),%0 ===> leal [esi-1],__res(g) 
2: testb %%al,%%al ===> 2: testb al,al 
jne 1 ===> jne 1 
本函数分析类似上面的strchr()。只不过是找到在const char *s所指向的字符串c出现的最后的位置。找到了,返回其所在地址;没找到,返回0。分析类似上面的strchr(),不再重复。 
strrchr - Find the last occurrence of a character in a string. 

如果s为空指针,则后果无法预料。 
--------------------------------------------------------------- 
__HAVE_ARCH_STRLEN strlen() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_STRLEN 
static inline size_t strlen(const char * s) 

int d0; 
register int __res; 
__asm__ __volatile__( 
"repne\n\t" 
"scasb\n\t" 
"notl %0\n\t" 
"decl %0" 
:"=c" (__res), "=&D" (d0) 
:"1" (s),"a" (0), "0" (0xffffffffu) 
:"memory"); 
return __res; 


参数初始值分析: 
di/edi:const char * s 
ax/eax:0 
cx/ecx:0xffffffff 
size_t ecx = 0xffffffff; 
ZF = 0; 
char * edi = s; 
指令重排: eax = 0; 
repne ===> while(ecx != 0 && ZF == 0) 
scasb ===> { 
if((al-es:[edi])==0) 
ZF = 1; 
edi++; 
ecx--; 

notl %0 ===> ecx = !ecx; 
decl %0 ===> ecx--; 

此处函数主要是ecx = !ecx,由于ecx是从0xffffffff递减下来的。记住:递减计数和递增计数是一样的计数,只要在最后,取个反,就让两者相互转化了。在递减计数或递增计数过程中多计数了的值,在最后取反后,要(转化后的数--)。 

至于各种情况分析,很简单,同前,无须多说。 
而对于极端情况分析,edi++,ecx--到0xfffffffff--->0x00000000,情况同前。 

参考: 
typedef unsigned int __kernel_size_t; 
typedef __kernel_size_t size_t; 
--------------------------------------------------------------- 
__memcpy() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

static inline void * __memcpy(void * to, const void * from, size_t n) 

int d0, d1, d2; 
__asm__ __volatile__( 
"rep ; movsl\n\t" 
"movl %4,%%ecx\n\t" 
"andl $3,%%ecx\n\t" 
#if 1 /* want to pay 2 byte penalty for a chance to skip microcoded rep? */ 
"jz 1f\n\t" 
#endif 
"rep ; movsb\n\t" 
"1:" 
: "=&c" (d0), "=&D" (d1), "=&S" (d2) 
: "0" (n/4), "g" (n), "1" ((long) to), "2" ((long) from) 
: "memory"); 
return (to); 


参数初始值: 
cx/ecx:n/4 
di/edi:to 
si/esi:from 


指令重排: ecx = n/4; 
rep ===> while( ecx-- != 0 ) 
movsl ===> (long)ds:[esi] = (long)es:[edi]; 
movl %4,%%ecx ===> ecx = n; 
andl $3,%%ecx ===> ZF = ecx & 0x00000003 
#if 1 
jz 1 ===> if(ZF==0) goto 1; 
#endif 
rep ===> while( ecx-- != 0 ) 
movsb ===> (char)ds:[esi] = (char)es:[edi]; 
1: ===> 1:

分析: 
1.先进行4B为单位的复制: 
ecx = n/4;然后就开始复制。 
2.求出ecx = ecx % 4;对不足4B的字节进行复制。 
ZF = ecx & 0x00000003; 
以上为一般情况分析。 

3.如果 0< n <4: 
则ecx = n/4 == 0; 
if( ecx-- !=0 )条件不成立,不进行4B单位的复制。直接进行以字节为单位的复制。 

4.如果n = 0: 
两个if条件均不满足,根本就不复制。 

5.如果n < 0: 
函数依然工作,只是牵涉到补码了,后果未知。 

如果0<n<4 

参考: 
typedef unsigned int __kernel_size_t; 
typedef __kernel_size_t size_t; 
--------------------------------------------------------------- 
__constant_memcpy() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

/* 
* This looks ugly, but the compiler can optimize it totally, 
* as the count is constant. 
*/ 
static inline void * __constant_memcpy(void * to, const void * from, size_t n) 

long esi, edi; 
if (!n) return to; 
#if 1 /* want to do small copies with non-string ops? */ 
switch (n) 

case 1: *(char*)to = *(char*)from; return to; 
case 2: *(short*)to = *(short*)from; return to; 
case 4: *(int*)to = *(int*)from; return to; 
#if 1 /* including those doable with two moves? */ 
case 3: *(short*)to = *(short*)from; 
*((char*)to+2) = *((char*)from+2); return to; 
case 5: *(int*)to = *(int*)from; 
*((char*)to+4) = *((char*)from+4); return to; 
case 6: *(int*)to = *(int*)from; 
*((short*)to+2) = *((short*)from+2); return to; 
case 8: *(int*)to = *(int*)from; 
*((int*)to+1) = *((int*)from+1); return to; 
#endif/* 1 */ 
}/* switch */ 
#endif/* 1 */ 
esi = (long) from; 
edi = (long) to; 
if (n >= 5*4) 

/* large block: use rep prefix */ 
int ecx; 
__asm__ __volatile__( 
"rep ; movsl" 
: "=&c" (ecx), "=&D" (edi), "=&S" (esi) 
: "0" (n/4), "1" (edi),"2" (esi) 
: "memory" 
); 
}/* if */ 

else 

/* small block: don't clobber ecx + smaller code */ 
if (n >= 4*4) __asm__ __volatile__( 
"movsl" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 

if (n >= 3*4) __asm__ __volatile__( 
"movsl" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 

if (n >= 2*4) __asm__ __volatile__( 
"movsl" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 

if (n >= 1*4) __asm__ __volatile__( 
"movsl" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 
}/* else */ 

switch (n % 4) 

/* tail */ 
case 0: return to; 

case 1: __asm__ __volatile__( 
"movsb" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 
return to; 

case 2: __asm__ __volatile__( 
"movsw" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 
return to; 

default: __asm__ __volatile__( 
"movsw\n\tmovsb" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 
return to; 
}/* switch */ 


代码分析: 
1.对1-8,(一包括7)个字节的复制,采用不同类型的变量进行复制: 
#if 1 /* want to do small copies with non-string ops? */ 
switch (n) 

case 1: *(char*)to = *(char*)from; return to; 
case 2: *(short*)to = *(short*)from; return to; 
case 4: *(int*)to = *(int*)from; return to; 
#if 1 /* including those doable with two moves? */ 
case 3: *(short*)to = *(short*)from; 
*((char*)to+2) = *((char*)from+2); return to; 
case 5: *(int*)to = *(int*)from; 
*((char*)to+4) = *((char*)from+4); return to; 
case 6: *(int*)to = *(int*)from; 
*((short*)to+2) = *((short*)from+2); return to; 
case 8: *(int*)to = *(int*)from; 
*((int*)to+1) = *((int*)from+1); return to; 
#endif/* 1 */ 
}/* switch */ 
#endif/* 1 */ 
当要复制的字节数为:1-8个之间时。执行以上这段程序。当字节数为: 
1个:用char * 
2个:用short * 
4个:用int* 

2.复制的字节数在[20,>20],[16,19],[12,15],[8,11],[4,7]: 
if (n >= 5*4) //当要复制的字节数在[20,>20]时: 

/* large block: use rep prefix */ 
int ecx; 
__asm__ __volatile__( 
"rep ; movsl" 
: "=&c" (ecx), "=&D" (edi), "=&S" (esi) 
: "0" (n/4), "1" (edi),"2" (esi) 
: "memory" 
); 
}/* if */ 

分析: esi = (long) from; 
edi = (long) to; 
ecx = n/4; 
rep ===> if( ecx-- != 0 ) 
movsl ===> { 
(unsigned long)es:[edi] = ds:[esi]; 

然后就转入下一个switch{}结构体中执行: 
switch (n % 4) 

/* tail */ 
case 0: return to; 

case 1: __asm__ __volatile__( 
"movsb" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 
return to; 

case 2: __asm__ __volatile__( 
"movsw" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 
return to; 

default: __asm__ __volatile__( 
"movsw\n\tmovsb" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 
return to; 
}/* switch */ 
代码简单,不再啰嗦。就是再将剩下的不足4B的字节复制过去。 
default是表示,n%4 == 3,先复制一个字,再复制一个字节,共3B。 
-------------------------------------------------------------- 
else //当要复制的字节数在 4<= n <=19时: 

/* small block: don't clobber ecx + smaller code */ 
//当要复制的字节数在[16,19]时: 
if (n >= 4*4) __asm__ __volatile__( 
"movsl" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 

//当要复制的字节数在[12,15]时: 
if (n >= 3*4) __asm__ __volatile__( 
"movsl" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 

//当要复制的字节数在[8,11]时: 
if (n >= 2*4) __asm__ __volatile__( 
"movsl" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 

//当要复制的字节数在[4,7]时: 
if (n >= 1*4) __asm__ __volatile__( 
"movsl" 
:"=&D"(edi),"=&S"(esi) 
:"0"(edi),"1"(esi) 
:"memory"); 
}/* else */ 

分析: 
???: ecx初始值没指定???ecx = n/4这才对啊! 
其实这些代码合并成一个: 
if( n >- 1*4 )//7,[9,19] 
__asm__ __volatile__( 
"rep; movsl\t\n" 
:"=&D"(edi),"=&S"(esi),"=C" 
:"0"(edi),"1"(esi),"2"(n/4) 
:"memory"); 

注意: 
__constant_memcpy()与__memcpy()很相同,参数个数和类型一样,同时功能作用也一样。 
--------------------------------------------------------------- 
__constant_memcpy3d() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_MEMCPY 
#ifdef CONFIG_X86_USE_3DNOW/* 对下面的__constant_memcpy3d() 
__memcpy3d(),memcpy()*/ 
#include <asm/mmx.h> 
/* 
* This CPU favours 3DNow strongly (eg AMD Athlon) 
*/ 
static inline void * __constant_memcpy3d(void * to, const void * from, size_t len) 

if (len < 512) 
return __constant_memcpy(to, from, len); 
return _mmx_memcpy(to, from, len); 

????_mmx_memcpy()函数找不到,只好罢手!!! 
--------------------------------------------------------------- 
__memcpy3d() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

static __inline__ void *__memcpy3d(void *to, const void *from, size_t len) 

if (len < 512) 
return __memcpy(to, from, len); 
return _mmx_memcpy(to, from, len); 

????_mmx_memcpy()函数找不到,只好罢手!!! 
--------------------------------------------------------------- 
memcpy() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define memcpy(t, f, n) \ 
(__builtin_constant_p(n) ? \ 
__constant_memcpy3d((t),(f),(n)) : \ 
__memcpy3d((t),(f),(n))) 
#else/* CONFIG_X86_USE_3DNOW */ 
/* 
* No 3D Now! 
*/ 
#define memcpy(t, f, n) \ 
(__builtin_constant_p(n) ? \ 
__constant_memcpy((t),(f),(n)) : \ 
__memcpy((t),(f),(n))) 
#endif/* CONFIG_X86_USE_3DNOW */ 

int __builtin_constant_p(exp)学习: 
You can use the built-in function __builtin_constant_p to determine if a value is known to be constant at compile-time and hence that GCC can perform constantfolding on expressions involving that value. The argument of the function is the value to test. The function returns the integer 1 if the argument is known to be a compiletime constant and 0 if it is not known to be a compile-time constant. A return of 0 does not indicate that the value is not a constant, but merely that GCC cannot prove it is a constant with the specified value of the ‘-O’ option. 
You would typically use this function in an embedded application where memory was a critical resource. If you have some complex calculation, you may want it to be folded if it involves constants, but need to call a function if it does not. For example: 

#define Scale_Value(X) \ 
(__builtin_constant_p (X) \ 
? ((X) * SCALE + OFFSET) : Scale (X)) 

You may use this built-in function in either a macro or an inline function. However, if you use it in an inlined function and pass an argument of the function as the argument to the built-in, GCC will never return 1 when you call the inline function with a string constant or compound literal and will not return 1 when you pass a constant numeric value to the inline function unless you specify the ‘-O’ option. 

使用__builtin_constant_p()要和gcc中的-O选项配合使用。 

You may also use __builtin_constant_p in initializers for static data. For instance,you can write 
static const int table[] = { 
__builtin_constant_p (EXPRESSION) ? (EXPRESSION) : -1, 
/* . . . */ 
}; 
This is an acceptable initializer even if EXPRESSION is not a constant expression. 
GCC must be more conservative about evaluating the built-in in this case, because it has no opportunity to perform optimization.Previous versions of GCC did not accept this built-in in data initializers. The earliest version where it is completely safe is 3.0.1. 

--------------------------------------------------------------- 
__HAVE_ARCH_MEMMOVE 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_MEMMOVE 
void *memmove(void * dest,const void * src, size_t n); 
memmove()延用string.c中的函数。 

#define memcmp __builtin_memcmp 
--------------------------------------------------------------- 
__HAVE_ARCH_MEMCHR memchr() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_MEMCHR 
static inline void * memchr(const void * cs,int c,size_t count) 

int d0; 
register void * __res; 
if (!count) return NULL; 
__asm__ __volatile__( 
"repne\n\t" 
"scasb\n\t" 
"je 1f\n\t" 
"movl $1,%0\n" 
"1:\tdecl %0" 
:"=D" (__res), "=&c" (d0) 
:"a" (c),"0" (cs),"1" (count) 
:"memory"); 
return __res; 


功能:cs指定内存的起始位置,count指定查找的个数,c指定要查找的内容。在以cs指定的内存为查找的起始位置,以cs+count为终止位置来查找内容c。找到就返回所找到的位置;没找到就返回0。 

参数初始值: 
ax/eax: c 
di/edi: const void * cs 
cx/ecx: count 
ZF = 0; 
ax = c; 
edi = cs; 
ecx = count; 
指令重排: 
repne ===> while( ecx-- != 0 && ZF == 0) 

scasb ===> if((al-es:[edi++])==0) 
ZF = 1; 

je 1 ===> if(ZF == 1) goto 1; 
movl $1,%0 ===> edi = 1; 
1: ===> 1: 
decl %0 ===> edi--; 
return edi; 
返回值:如果找到了c,就返回c所在的位置,如果没找到,就返回0。 
一般情况代码简单,就此住手。 

特殊情况: 
1.若ecx==0:则两个if条件均不满足,直接返回0,结束程序。 
2.若ecx为0xffffffff巨大的数:要么在其中能找到能与c相匹配的数,返回其位置;要么找不到,当ecx--变为0时,(当ecx==0时,跳出循环时,ecx还要再--又变为0xffffffff),并返回0。 
3.此处无负数,故ecx<0一情况无须多虑。由于是内存操作函数,连'\0'也可以进入比较范围。 
--------------------------------------------------------------- 
__memset_generic() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

static inline void * __memset_generic(void * s, char c,size_t count) 

int d0, d1; 
__asm__ __volatile__( 
"rep\n\t" 
"stosb" 
: "=&c" (d0), "=&D" (d1) 
:"a" (c),"1" (s),"0" (count) 
:"memory"); 
return s; 

ax = c; 
edi = s; 
ecx = count; 
rep ====> while( ecx !=0 ) 

stosb ====> es:[edi] = al; 

return s; 
--------------------------------------------------------------- 
__constant_count_memset() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

/* we might want to write optimized versions of these later */ 
#define __constant_count_memset(s,c,count) __memset_generic((s),(c),(count)) 
--------------------------------------------------------------- 
__constant_c_memset() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

/* 
* memset(x,0,y) is a reasonably common thing to do, so we want to fill 
* things 32 bits at a time even when we don't know the size of the 
* area at compile-time.. 
*/ 
static inline void * __constant_c_memset(void * s, unsigned long c, size_t count) 

int d0, d1; 
__asm__ __volatile__( 
"rep ; stosl\n\t" 
"testb $2,%b3\n\t" 
"je 1f\n\t" 
"stosw\n" 
"1:\ttestb $1,%b3\n\t" 
"je 2f\n\t" 
"stosb\n" 
"2:" 
:"=&c" (d0), "=&D" (d1) 
:"a" (c), "q" (count), "0" (count/4), "1" ((long) s) 
:"memory"); 
return (s); 

参数初始值分析: 
ax/eax: c 
cx/ecx: count/4 
di/edi: void *s 

指令重排: 
rep ====> while( ecx-- != 0 ) 

stosl ====> (long)es:[edi] = eax; 
edi += 4; 

testb $2,%b3 ====> if( (0x02 & (char)count) == 0 ) 
je 1 ====> goto 1; 
stosw ====> (short)es:[edi] = ax; 
edi += 2; 
1: testb $1,%b3 ====> 1: if( (0x01 & (char)count) == 0) 
je 2 ====> goto 2; 
stosb ====> (char)es:[edi] = al; 
2: ====> 2: 
分析: 
先以4B为单位进行复制字节。完成后,再分别测试倒数第2位,最后一位是否为1,从而判断是否还剩3,2,1,0个字节。若还剩3B,则复制一个字后,还剩1B;若还剩2B,则复制一个字后,还剩0B.与后面还剩2,0B的情况一样。 

特殊情况: 
若count==0,则while,if条件均不满足,跳出循环。 
--------------------------------------------------------------- 
__HAVE_ARCH_STRNLEN strnlen() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

/* Added by Gertjan van Wingerde to make minix and sysv module work */ 
#define __HAVE_ARCH_STRNLEN 
static inline size_t strnlen(const char * s, size_t count) 

int d0; 
register int __res; 
__asm__ __volatile__( 
"movl %2,%0\n\t" 
"jmp 2f\n" 
"1:\tcmpb $0,(%0)\n\t" 
"je 3f\n\t" 
"incl %0\n" 
"2:\tdecl %1\n\t" 
"cmpl $-1,%1\n\t" 
"jne 1b\n" 
"3:\tsubl %2,%0" 
:"=a" (__res), "=&d" (d0) 
:"c" (s),"1" (count) 
:"memory"); 
return __res; 

/* end of additional stuff */ 

参数初始值分析: 
cx/ecx: const char * s 
dx/edx: count 
ax/eax: __res 

指令重排: 
size_t edx; 
edx = count; 
char * eax,ecx; 
ecx = s; 

movl %2,%0 ====> eax = s; //ecx = eax = s; 
jmp 2 ====> goto 2; 

1: cmpb $0,(%0) ====> 1: if( ((char)(ds:[eax]))==0 )
je 3 ====> goto 3; 
incl %0 ====> eax++; 

2: decl %1 ====> 2: edx--; 
cmpl $-1,%1 ====> if( (0xffffffff & edx) != 0) 
jne 1 ====> goto 1; 

3: subl %2,%0 ====> 3: eax -= ecx; 
return eax; 
各种情况分析: 
1.字符串的长度(不含'\0') < count: 
s==>"abcd\0?" 
count == 5: eax已经指向'\0',但还尚未比较之。edx==1,经过edx--后变为edx==0,从而结束函数。再经过eax-=ecx;后,eax==4,为字符串的长度(不含'\0')作为函数返回值。 

count == 6: edx==1,尚未变为0,但eax=='\0',且经过if条件的比较后,跳出循环,eax==4,为字符串的长度(不含'\0')作为函数返回值。 

2.字符串的长度(不含'\0') == count: 
s==>"abcd\0?" 
count == 4: count总共比较3次,eax最后指向'd'(但尚未比较),eax-=ecx后,eax==3,为count-1的值,也即循环的次数。 

3.字符串的长度(不含'\0') > count: 
s==>"abcd\0?" 
count == 3: 共循环2次后,count变为0,从而结束循环。此时比较了两个字符'a'和'b',eax指向'c',但尚未比较。eax-=ecx后,eax=2,为count-1,也就是所循环的次数。 

4.字符串的长度(不含'\0')== 0: 
s==>"\0?" 
count == 4: 返回eax==0。 

5.count == 1 
s==>"abcd\0?" 
count == 1: 返回eax==0。 

6.count == 0 
s==>"abcd\0?" : edx--后,edx变为0xffffffff,要么当edx又减为0时,终止循环,eax当为0,共加了0xffffffff次,又回到原来的值;要么找到为'\0'处,此时返回字符串的长度(不含'\0')。 

功能分析: 
s指定一个字符串的首地址,count指定一个长度。对该字符串进行扫描,若字符串的总长度(不含'\0')小于count,就返回该字符串的总长度(不含'\0');若字符串的总长度(不含'\0')>= count,就返回count-1;若字符串的总长度(不含'\0')== 0或count==1就返回0。若count==0则情况未知。 

--------------------------------------------------------------- 
__HAVE_ARCH_STRSTR strstr() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_STRSTR 
extern char *strstr(const char *cs, const char *ct); 
此处当是引用string.c中的strstr()函数。 
--------------------------------------------------------------- 
__constant_c_and_count_memset() 
--------------------------------------------------------------- 
include/asm-i386/string.h 
/* 
* This looks horribly ugly, but the compiler can optimize it totally, 
* as we by now know that both pattern and count is constant.. 
*/ 
static inline void * __constant_c_and_count_memset(void * s, unsigned long pattern, size_t count) 

switch (count) 

case 0: 
return s; 
case 1: 
*(unsigned char *)s = pattern; 
return s; 
case 2: 
*(unsigned short *)s = pattern; 
return s; 
case 3: 
*(unsigned short *)s = pattern; 
*(2+(unsigned char *)s) = pattern; 
return s; 
case 4: 
*(unsigned long *)s = pattern; 
return s; 

#define COMMON(x) \ 
__asm__ __volatile__( \ 
"rep ; stosl" \ 
x \ 
: "=&c" (d0), "=&D" (d1) \ 
: "a" (pattern),"0" (count/4),"1" ((long) s) \ 
: "memory") 

int d0, d1; 
switch (count % 4) 

case 0: COMMON(""); return s; 
case 1: COMMON("\n\tstosb"); return s; 
case 2: COMMON("\n\tstosw"); return s; 
default: COMMON("\n\tstosw\n\tstosb"); return s; 



#undef COMMON 

分析: 
1.count == [0,4] : 
switch (count) 

case 0: 
return s; 
case 1: 
*(unsigned char *)s = pattern; 
return s; 
case 2: 
*(unsigned short *)s = pattern; 
return s; 
case 3: 
*(unsigned short *)s = pattern; 
*(2+(unsigned char *)s) = pattern; 
return s; 
case 4: 
*(unsigned long *)s = pattern; 
return s; 


2.count > 4 : 
#define COMMON(x) \ 
__asm__ __volatile__( \ 
"rep ; stosl" \ 
x \ 
: "=&c" (d0), "=&D" (d1) \ 
: "a" (pattern),"0" (count/4),"1" ((long) s) \ 
: "memory") 

int d0, d1; 
switch (count % 4) 

case 0: COMMON(""); return s; 
case 1: COMMON("\n\tstosb"); return s; 
case 2: COMMON("\n\tstosw"); return s; 
default: COMMON("\n\tstosw\n\tstosb"); return s; 



#undef COMMON 

a):注意这种在函数内部使用宏的方法: 
1)先用#define定义宏; 
2)再用一对{}括住函数体; 
3)再在后面用#undef取消所定义的的宏; 

b):#define COMMON(x) \ 
__asm__ __volatile__( \ 
"rep ; stosl" \ 
x \ 
: "=&c" (d0), "=&D" (d1) \ 
: "a" (pattern),"0" (count/4),"1" ((long) s) \ 
: "memory") 

参数初始值: 
ax/eax: pattern 
cx/ecx: count/4 
di/edi: s 

指令重排: 
COMMON("")展开为: 
eax = pattern; 
edi = s; 
ecx = count/4; 
rep ===> while( ecx-- != 0 ) 

stosl ===> es:[edi] = eax; 
edi += 4; 

return s; 

COMMON("\n\tstosb")展开为: 
eax = pattern; 
edi = s; 
ecx = count/4; 
rep ===> while( ecx-- != 0 ) 

stosl ===> es:[edi] = eax; 
edi += 4; 

x ===> stosb ===> es:[edi] = al; 
edi += 1; 
return s; 

COMMON("\n\tstosw")展开为: 
eax = pattern; 
edi = s; 
ecx = count/4; 
rep ===> while( ecx-- != 0 ) 

stosl ===> es:[edi] = eax; 
edi += 4; 

x ===> stosw ===> es:[edi] = ax; 
edi += 2; 
return s; 

COMMON("\n\tstosw\n\tstosb")展开为: 
eax = pattern; 
edi = s; 
ecx = count/4; 
rep ===> while( ecx-- != 0 ) 

stosl ===> es:[edi] = eax; 
edi += 4; 

x => stosw;stosb=> es:[edi] = ax; 
edi += 2; 
es:[edi] = al; 
edi += 1; 

return s; 

c): 进一步分析: 

int d0, d1; 
switch (count % 4) 

case 0: COMMON(""); return s; 
case 1: COMMON("\n\tstosb"); return s; 
case 2: COMMON("\n\tstosw"); return s; 
default: COMMON("\n\tstosw\n\tstosb"); return s; 


对剩下的字节数进行移动!!! 

--------------------------------------------------------------- 
__constant_c_x_memset()
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __constant_c_x_memset(s, c, count) \ 
(__builtin_constant_p(count) ? \ 
__constant_c_and_count_memset((s),(c),(count)) : \ 
__constant_c_memset((s),(c),(count))) 

功能:对s所指定的的字符串用c填充指定的个数count个字节。 

参考资料: 
1.__constant_c_and_count_memset(): 
static inline void * __constant_c_and_count_memset(void * s, unsigned long pattern, size_t count) 

switch (count) 

case 0: 
return s; 
case 1: 
*(unsigned char *)s = pattern; 
return s; 
case 2: 
*(unsigned short *)s = pattern; 
return s; 
case 3: 
*(unsigned short *)s = pattern; 
*(2+(unsigned char *)s) = pattern; 
return s; 
case 4: 
*(unsigned long *)s = pattern; 
return s; 

#define COMMON(x) \ 
__asm__ __volatile__( \ 
"rep ; stosl" \ 
x \ 
: "=&c" (d0), "=&D" (d1) \ 
: "a" (pattern),"0" (count/4),"1" ((long) s) \ 
: "memory") 

int d0, d1; 
switch (count % 4) 

case 0: COMMON(""); return s; 
case 1: COMMON("\n\tstosb"); return s; 
case 2: COMMON("\n\tstosw"); return s; 
default: COMMON("\n\tstosw\n\tstosb"); return s; 



#undef COMMON 


2.__constant_c_memset(): 
static inline void * __constant_c_memset(void * s, unsigned long c, size_t count) 

int d0, d1; 
__asm__ __volatile__( 
"rep ; stosl\n\t" 
"testb $2,%b3\n\t" 
"je 1f\n\t" 
"stosw\n" 
"1:\ttestb $1,%b3\n\t" 
"je 2f\n\t" 
"stosb\n" 
"2:" 
:"=&c" (d0), "=&D" (d1) 
:"a" (c), "q" (count), "0" (count/4), "1" ((long) s) 
:"memory"); 
return (s);

--------------------------------------------------------------- 
__memset() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __memset(s, c, count) \ 
(__builtin_constant_p(count) ? \ 
__constant_count_memset((s),(c),(count)) : \ 
__memset_generic((s),(c),(count))) 

功能:将s所指定的内存区域用c字符填充count次数。 

参考资料: 
1.__constant_count_memset(): 
#define __constant_count_memset(s,c,count) __memset_generic((s),(c),(count)) 

2.__memset_generic(): 
static inline void * __memset_generic(void * s, char c,size_t count) 

int d0, d1; 
__asm__ __volatile__( 
"rep\n\t" 
"stosb" 
: "=&c" (d0), "=&D" (d1) 
:"a" (c),"1" (s),"0" (count) 
:"memory"); 
return s; 

--------------------------------------------------------------- 
__HAVE_ARCH_MEMSET memset() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

#define __HAVE_ARCH_MEMSET 
#define memset(s, c, count) \ 
(__builtin_constant_p(c) ? \ 
__constant_c_x_memset((s),(0x01010101UL*(unsigned char)(c)),(count)) : \ 
__memset((s),(c),(count))) 

功能同上: 

参考资料: 
1.__constant_c_x_memset(): 
#define __constant_c_x_memset(s, c, count) \ 
(__builtin_constant_p(count) ? \ 
__constant_c_and_count_memset((s),(c),(count)) : \ 
__constant_c_memset((s),(c),(count))) 

2.__memset()同上。 

?????(0x01010101UL*(unsigned char)(c))是什么意思??? 
--------------------------------------------------------------- 
__HAVE_ARCH_MEMSCAN memscan() 
--------------------------------------------------------------- 
include/asm-i386/string.h 

/* 
* find the first occurrence of byte 'c', or 1 past the area if none 
*/ 
#define __HAVE_ARCH_MEMSCAN 
static inline void * memscan(void * addr, int c, size_t size) 

if (!size) return addr; 
__asm__("repnz; scasb\n\t" 
"jnz 1f\n\t" 
"dec %%edi\n" 
"1:" 
: "=D" (addr), "=c" (size) 
: "0" (addr), "1" (size), "a" (c) 
: "memory"); 
return addr; 


重排指令: 
edi = addr; 
ecx = size; 
eax = c; 
ZF = 0; 
repnz ====> while( ecx-- != 0 && ZF == 0 ) 

scasb ====> if( (al - es:[edi++]) == 0 ) 
ZF = 1; 

jnz 1 ====> if( ZF != 0 ) goto 1; 
dec %%edi ====> edi--; 
1: ====> 1: 

此函数的汇编非常简单,就不再啰嗦了。 
线性扫描内存,找到了第一个'c',就返回找到的地址;没找到就返回所比较的最后一个位置。 
#endif /* __KERNEL__ */ 

#endif /* !_I386_STRING_H_ */ 
*************************************************************** 
汇编写的字符串函数终于啃完了!!! 

posted @ 2012-02-05 13:36  taek  阅读(1425)  评论(0编辑  收藏  举报