X86调用约定 calling convention
http://zh.wikipedia.org/wiki/X86%E8%B0%83%E7%94%A8%E7%BA%A6%E5%AE%9A
这里描述了在x86芯片架构上的调用约定(calling conventions)。 调用约定描述了被调用代码的接口:
- 原子(标量)参数,或复杂参数独立部分的分配顺序;
- 参数是如何被传递的(放置在栈上,或是寄存器中,亦或两者混合);
- 被调用者应保存调用者的哪个寄存器;
- 调用函数时如何为任务准备堆栈,以及任务完成如何恢复;
这与编程语言中对于大小和格式的分配紧密相关。另一个密切相关的是名称修饰,这决定了代码中的符号名称如何映射到链接器中的符号名。
调用约定,类型表示和名称修饰这三者的统称,即是总所周知的应用二进制接口(ABI)。
不同编译器在实现这些约定总是有细微的差别存在,所以在不同编译器编译出来的代码很难接合起来。
另一方面,有些约定被当作一种API标准(如stdcall),编译器实现都较为一致。
调用者清理 cdecl syscall optlink
在这些约定中,调用者自己清理栈上的变元(arguments),这样就运行了可变参数列表的实现,如printf()。
cdecl
cdecl(C declaration,即C声明)是源起C语言的一种调用约定,x86架构上的许多C编译器都使用这个约定。
在cdecl中,子例程变元是在栈上传递的。EAX寄存器返回整型值和内存地址,浮点数则是在ST0 x87寄存器上。
EAX, ECX和EDX寄存器是由调用者保存的,其余的寄存器由被调用者保存。(EBX, EBP, ESI, EDI)
当调用一个新函数时,x87浮点寄存器ST0到ST7都必须为空(弹出或释放掉),而且在退出函数时ST1到ST7也必须为空。
在C语言中,函数参数是以相反顺序推入栈的。在GNU/Linux GCC,把这一约定做为事实上的标准。
GCC自4.5版本开始,调用函数时,堆栈上的数据必须以16B对齐(之前的版本只需要4B对齐即可)。
cdecl调用约定通常作为x86 C编译器的默认调用规则,许多编译器也提供了自动切换调用约定的选项。
如果需要手动指定调用规则为cdecl,编译器可能会支持如下语法:
void _cdecl funct();
其中_cdecl修饰符需要在函数原型中给出,在函数声明中会覆盖掉其他的设置。
syscall
与cdecl类似,变元被从右到左推入栈中。EAX, ECX和EDX不会保留值。参数列表的大小被放置在AL寄存器中(?)。
syscall是32位OS/2 API的标准。
optlink
变元也是从右到左被推入栈。从最左边开始的三个字符变元会被放置在EAX, EDX和ECX中,最多四个浮点变元会被传入ST(0)到ST(3)中----
虽然这四个参数的空间也会在参数列表的栈上保留。函数的返回值在EAX或ST(0)中。保留的寄存器有EBP, EBX, ESI和EDI。
optlink在IBM VisualAge编译器中被使用。
被调用者清理 pascal register stdcall fastcall (microsoft, borland )
如果被调用者要清理栈上的参数,需要在编译阶段知道栈上有多少字节要处理。因此,此类的调用约定并不能兼容于可变参数列表,如printf()。
然而,这种调用约定也许会更有效率,因为需要解堆栈的代码不要在每次调用时都生成一遍。
使用此规则的函数容易在asm代码被认出,因为它们会在返回前解堆栈。
x86 ret指令允许一个可选的16位参数说明栈字节数,用来在返回给调用者之前解堆栈。代码类似如下:
ret 12
pascal
基于Pascal语言的调用约定,参数从左至右入栈(与cdecl相反)。被调用者负责在返回前清理堆栈。 此调用约定常见在如下16-bit API中:OS/2 1.x,微软Windows 3.x,以及Borland Delphi版本1.x。
register
Borland fastcall的别名而已。
stdcall
这个一个Pascal调用约定的变体,被调用者依旧负责清理堆栈,但是参数从右往左入栈----与cdecl一致。
寄存器EAX, ECX和EDX被指定在函数中使用,返回值放置在EAX中。
stdcall对于微软Win32 API和Open Watcom C++是标准。
fastcall
此约定还未被标准化,不同编译器的实现也不一致。 典型的fastcall约定会传递一个或多个变元到寄存器上,减少对内存的访问。
Microsoft fastcall
Microsoft或GCC的__fastcall约定(也即__msfastcall)传入头两个变元(从左至右)到ECX和EDX中,剩下的变元从右至左推入栈上。
Borland fastcall
从左至右,传入三个参数至EAX, EDX和ECX中。剩下的参数推入栈,也是从左至右。
在32位编译器Embarcadero Delphi中,这是缺省调用约定,在编译器中以register形式为人知。 在i386上的某些版本Linux也使用了此约定。
调用者或被调用者清理 thiscall
thiscall
在调用C++非静态成员函数时使用此约定。基于所使用的编译器和函数是否使用可变参数,有两个主流版本的thiscall。
对于GCC编译器,thiscall几乎与cdecl等同:调用者清理堆栈,参数从右到左传递。差别在于this指针,thiscall会在最后把指针推入栈中,虽然在函数原型中它是隐式的第一个参数。
在微软Visual C++编译器中,this指针被传到ECX寄存器上,被调用者负责清理堆栈,其余同此编译器的C版本和Windows API函数使用的stdcall约定。
当函数使用可变参数,此时调用者负责清理堆栈(参考cdecl)。 thiscall约定只在微软Visual C++ 2005及其之后的版本被显式指定。
其他编译器中,thiscall并不是一个关键字(反汇编器如IDA使用__thiscall)。
x86-64调用约定
x86-64调用约定得益于更多的寄存器可以用来传参。而且,不兼容的调用约定也更少了,不过还是有2种主流的规则。
微软x64调用约定
微软x64调用约定使用RCX, RDX, R8, R9这四个寄存器传递头四个整型或指针变量(从左到右),
使用XMM0, XMM1, XMM2, XMM3来传递浮点变量。
其他的参数直接入栈(从右至左)。
整型返回值放置在RAX中,浮点返回值在XMM0中。
少于64位的参数并没有做零扩展,此时高位充斥着垃圾。
在Windows x64环境下编译代码时,只有一种调用约定----就是上面描述的约定,也就是说,32位下的各种约定在64位下统一成一种了。
在微软x64调用约定中,调用者的一个职责是在调用函数之前(无论实际的传参使用多大空间),在栈上分配一个32B的“影子空间”;并且在调用之后用弹出此堆栈。
影子空间是用来给RCX, RDX, R8和R9提供溢出空间的(?),即使是对于少于四个参数的函数而言。
例如, 一个函数拥有5个整型参数,第一个到第四个放在寄存器中,第五个就被推到影子空间栈顶上。
当函数被调用,此栈用来组成返回值----影子空间32位+第五个参数。
在x86-64体系下,Visual Studio 2008在XMM6和XMM7中(同样的有XMM8到XMM15)存储浮点数。
结果对于用户写的汇编语言例程,必须保存XMM6和XMM7(x86不用保存这两个寄存器),
这也就是说,在x86和x86-64之间移植汇编例程时,需要注意在函数调用之前/之后,要保存/恢复XMM6和XMM7。
System V AMD64 ABI
此约定主要在Solaris,GNU/Linux,FreeBSD和其他非微软OS上使用。
头六个整型参数放在寄存器RDI, RSI, RDX, RCX, R8和R9上;同时XMM0到XMM7用来放置浮点变元。
对于系统调用,R10用来替代RCX。同微软x64约定一样,其他额外的参数推入栈,返回值保存在RAX中。
与微软不同的是,不需要提供影子空间。在函数入口,返回值与栈上第七个整型参数相邻。
调用约定(pascal,fastcall,stdcall,thiscall,cdecl)区别等
http://blog.csdn.net/maotoula/article/details/6762062
一:函数调用约定;
函数调用约定是函数调用者和被调用的函数体之间关于参数传递、返回值传递、堆栈清除、寄存器使用的一种约定;
它是需要二进制级别兼容的强约定,函数调用者和函数体如果使用不同的调用约定,将可能造成程序执行错误,必须把它看作是函数声明的一部分;
二:常见的函数调用约定;
VC6中的函数调用约定;
调用约定 堆栈清除 参数传递
__cdecl 调用者 从右到左,通过堆栈传递
__stdcall 函数体 从右到左,通过堆栈传递
__fastcall 函数体 从右到左,优先使用寄存器(ECX,EDX),然后使用堆栈
thiscall 函数体 this指针默认通过ECX传递,其它参数从右到左入栈
__cdecl是C/C++的默认调用约定; VC的调用约定中并没有thiscall这个关键字,它是类成员函数默认调用约定;
C/C++中的main(或wmain)函数的调用约定必须是__cdecl,不允许更改;
默认调用约定一般能够通过编译器设置进行更改,如果你的代码依赖于调用约定,请明确指出需要使用的调用约定;
Delphi6中的函数调用约定;
调用约定 堆栈清除 参数传递
register 函数体 从左到右,优先使用寄存器(EAX,EDX,ECX),然后使用堆栈
pascal 函数体 从左到右,通过堆栈传递
cdecl 调用者 从右到左,通过堆栈传递(与C/C++默认调用约定兼容) stdcall 函数体 从右到左,通过堆栈传递(与VC中的__stdcall兼容) safecall 函数体 从右到左,通过堆栈传递(同stdcall)
Delphi中的默认调用约定是register,它也是我认为最有效率的一种调用方式,而cdecl是我认为综合效率最差的一种调用方式;
VC中的__fastcall调用约定一般比register效率稍差一些;
C++Builder6中的函数调用约定;
调用约定 堆栈清除 参数传递
__fastcall 函数体 从左到右,优先使用寄存器(EAX,EDX,ECX),然后使用堆栈 (兼容Delphi的register)
register 函数体 从左到右,优先使用寄存器(EAX,EDX,ECX),然后使用堆栈 (兼容Delphi的register)
__pascal 函数体 从左到右,通过堆栈传递
__cdecl 调用者 从右到左,通过堆栈传递(与C/C++默认调用约定兼容) __stdcall 函数体 从右到左,通过堆栈传递(与VC中的__stdcall兼容) __msfastcall 函数体 从右到左,优先使用寄存器(ECX,EDX),然后使用堆栈(兼容VC的__fastcall)
常见的函数调用约定中,只有cdecl约定需要调用者来清除堆栈;
C/C++中的函数支持参数数目不定的参数列表,比如printf函数;由于函数体不知道调用者在堆栈中压入了多少参数,
所以函数体不能方便的知道应该怎样清除堆栈,那么最好的办法就是把清除堆栈的责任交给调用者; 这应该就是cdecl调用约定存在的原因吧;
VB一般使用的是stdcall调用约定;(ps:有更强的保证吗)
Windows的API中,一般使用的是stdcall约定;(ps: 有更强的保证吗)
建议在不同语言间的调用中(如DLL)最好采用stdcall调用约定,因为它在语言间兼容性支持最好;
三:函数返回值传递方式
其实,返回值的传递从处理上也可以想象为函数调用的一个out形参数; 函数返回值传递方式也是函数调用约定的一部分;
有返回值的函数返回时:一般int、指针等32bit数据值(包括32bit结构)通过eax传递,(bool,char通过al传递,short通过ax传递),
特别的__int64等64bit结构(struct) 通过edx,eax两个寄存器来传递(同理:32bit整形在16bit环境中通过dx,ax传递);
其他大小的结构(struct)返回时把其地址通过eax返回;(所以返回值类型不是1,2,4,8byte时,效率可能比较差)
参数和返回值传递中,引用方式的类型可以看作与传递指针方式相同;
float/double(包括Delphi中的extended)都是通过浮点寄存器st(0)返回;
1.__cdecl
所谓的C调用规则。按从右至左的顺序压参数入栈,由调用者把参数弹出栈。切记:对于传送参数的内存栈是由调用者来维护的。
返回值在EAX中因此,对于象printf这样变参数的函数必须用这种规则。编译器在编译的时候对这种调用规则的函数生成修饰名的饿时候,仅在输出函数名前加上一个下划线前缀,格式为_functionname。
2.__stdcall
按从右至左的顺序压参数入栈,由被调用者把参数弹出栈。_stdcall是Pascal程序的缺省调用方式,通常用于Win32 Api中,切记:函数自己在退出时清空堆栈,返回值在EAX中。
__stdcall调用约定在输出函数名前加上一个下划线前缀,后面加上一个“@”符号和其参数的字节数,格式为_functionname@number。如函数int func(int a, double b)的修饰名是_func@12。
3.__fastcall
__fastcall调用的主要特点就是快,因为它是通过寄存器来传送参数的(实际上,它用ECX和EDX传送前两个双字(DWORD)或更小的参数,剩下的参数仍旧自右向左压栈传送,被调用的函数在返回前清理传送参数的内存栈)。__fastcall调用约定在输出函数名前加上一个“@”符号,后面也是一个“@”符号和其参数的字节数,格式为@functionname@number。
这个和__stdcall很象,唯一差别就是头两个参数通过寄存器传送。注意通过寄存器传送的两个参数是从左向右的,即第一个参数进ECX,第2个进EDX,其他参数是从右向左的入stack。返回仍然通过EAX.
4.__pascal
这种规则从左向右传递参数,通过EAX返回,堆栈由被调用者清除
5.__thiscall
仅仅应用于"C++"成员函数。this指针存放于CX寄存器,参数从右到左压。thiscall不是关键词,因此不能被程序员指定
调用约定可以通过工程设置:Setting...\C/C++ \Code Generation项进行选择,缺省状态为__cdecl。
函數調用方式: Stdcall Cdecl Fastcall WINAPI CALLBACK PASCAL Thiscall Fortran Syscall Declspec(Naked)
http://www.cnitblog.com/textbox/archive/2010/03/10/64575.html
现代的编程语言的函数竟然有那麽多的调用方式。这些东西要完全理解还得通过汇编代码才好理解。他们各自有自己的特点
其实这些调用方式的差别在主要在一下几个方面
1.参数处理方式(传递顺序,存取(利用盏还是寄存器))
2.函数的结尾处理方式(善后处理 如:栈的恢复由谁恢复? 函数内恢复/还是调用后恢复)
以下是理论:
__cdecl 由调用者平栈,参数从右到左依次入栈 是C和C++程序的缺省调用方式。每一个调用它的函数都包含清空堆栈的代码,
所以产生的可执行文件大小会比调用_stdcall函数的大。函数采用从右到左的压栈方式。VC将函数编译后会在函数名前面加上
下划线前缀。是MFC缺省调用约定
__stdcall ,WINAPI,CALLBACK ,PASCAL 由被调用者平栈,参数从右到左依次入栈 ._stdcall是Pascal程序的缺省调用方式,
通常用于Win32 Api中,函数采用从右到左的压栈方式,自己在退出时清空堆栈。VC将函数编译后会在函数名前面加上下划
线前缀,在函数名后加上"@"和参数的字节数
__fastcall 由被调用者平栈,参数先赋值给寄存器,然后入栈 “人”如其名,它的主要特点就是快,因为它是通过寄存器来传送参数的
(实际上,它用ECX和EDX传送前两个双字(DWORD)或更小的参数,剩下的参数仍旧自右向左压栈传送,被调用的函数在返回前
清理传送参数的内存栈),在函数名修饰约定方面,它和前两者均不同.
_fastcall方式的函数采用寄存器传递参数,VC将函数编译后会在函数名前面加上"@"前缀,在函数名后加上"@"和参数的字节数。
__thiscall 由被调用者平栈,参数入栈,this 指针赋给 ecx 寄存器 仅仅应用于“C++”成员函数。this指针存放于CX寄存器,参数从右
到左压。thiscall不是关键词,因此不能被程序员指定。
__declspec(naked) 这是一个很少见的调用约定,一般程序设计者建议不要使用。编译器不会给这种函数增加初始化和清理代码,
更特殊的是,你不能用return返回返回值,只能用插入汇编返回结果。这一般用于实模式驱动程序设计.
以下是实践:
int __stdcall test_stdcall(char para1, char para2) { para1 = para2; return 0; } int __cdecl test_cdecl(char para, ) { char p = '\n'; va_list marker; va_start( marker, para ); while( p != '\0' ) { p = va_arg( marker, char); printf("%c\n", p); } va_end( marker ); return 0; } int pascal test_pascal(char para1, char para2) { return 0; } int __fastcall test_fastcall(char para1, char para2, char para3, char para4) { para1 = (char)1; para2 = (char)2; para3 = (char)3; para4 = (char)4; return 0; } __declspec(naked) void __stdcall test_naked(char para1, char para2) { __asm { push ebp mov ebp, esp push eax mov al,byte ptr [ebp + 0Ch] xchg byte ptr [ebp + 8],al pop eax pop ebp ret 8 } // return ; } int main( int argc, char* argv[ ] ) { test_stdcall( 'a', 'b' ); test_cdecl( 'c', 'd', 'e', 'f', 'g', 'h', '\0' ); test_pascal( 'e', 'f' ); test_fastcall( 'g', 'h', 'i', 'j' ); test_naked( 'k', 'l' ); return 0; }
汇编代码如下
int main(int argc, char* argv[]) { 00411350 push ebp 00411351 mov ebp,esp 00411353 sub esp,0C0h 00411359 push ebx 0041135A push esi 0041135B push edi 0041135C lea edi,[ebp-0C0h] 00411362 mov ecx,30h 00411367 mov eax,0CCCCCCCCh 0041136C rep stos dword ptr es:[edi]
test_stdcall( 'a', 'b' ); 0041136E push 62h 00411370 push 61h 00411372 call _test_stdcall@8
test_cdecl( 'c','d','e','f','g' ,'h' ,'\0'); 00411377 push 0 00411379 push 68h 0041137B push 67h 0041137D push 66h 0041137F push 65h 00411381 push 64h 00411383 push 63h 00411385 call _test_cdecl 0041138A add esp,1Ch ;恢复_test_cdecl参数压入前的堆栈指令是: add esp,n*4 n=7, 参数的数量
test_fastcall( 'g', 'h', 'i', 'j' ); 0041138D push 6Ah 0041138F push 69h 00411391 mov dl,68h 00411393 mov cl,67h 00411395 call test_fastcall
test_naked( 'k', 'l'); 0041139A push 6Ch 0041139C push 6Bh 0041139E call _test_naked
return 0; 004113A3 xor eax,eax } int __stdcall test_stdcall(char para1, char para2) { 004111F0 push ebp 004111F1 mov ebp,esp 004111F3 sub esp,0C0h
004111F9 push ebx 004111FA push esi 004111FB push edi 004111FC lea edi,[ebp-0C0h] 00411202 mov ecx,30h 00411207 mov eax,0CCCCCCCCh 0041120C rep stos dword ptr es:[edi] ;初始edi para1 = para2; 0041120E mov al,byte ptr [para2] ;mov al,byte ptr[ebp+c] 00411211 mov byte ptr [para1],al ;mov byte ptr[ebp+8],al return 0; 00411214 xor eax,eax 00411216 pop edi 00411217 pop esi 00411218 pop ebx
00411219 mov esp,ebp 0041121B pop ebp 0041121C ret 8 ;恢复到压入函数参数前堆栈,由于有两个参数所以ret 8 相当于 pop eip 然后esp+8 }
int __cdecl test_cdecl(char para,... ) { 00411230 push ebp 00411231 mov ebp,esp 00411233 sub esp,0D8h 0041123C lea edi,[ebp-0D8h] 00411242 mov ecx,36h 00411247 mov eax,0CCCCCCCCh 0041124C rep stos dword ptr es:[edi] char p = '\n'; 0041124E mov byte ptr [p],0Ah va_list marker; va_start( marker, para ); 00411252 lea eax,[ebp+0Ch] 00411255 mov dword ptr [marker],eax while( p != '\0' ) 00411258 movsx eax,byte ptr [p] 0041125C test eax,eax 0041125E je test_cdecl+60h (411290h) { p = va_arg( marker, char); 00411260 mov eax,dword ptr [marker] 00411263 add eax,4 00411266 mov dword ptr [marker],eax 00411269 mov ecx,dword ptr [marker] 0041126C mov dl,byte ptr [ecx-4] 0041126F mov byte ptr [p],dl printf("%c\n", p); 00411272 movsx eax,byte ptr [p] 00411276 mov esi,esp 00411278 push eax 00411279 push offset string "%c\n" (41401Ch) 0041127E call dword ptr [__imp__printf (416180h)] 00411284 add esp,8 0041128E jmp test_cdecl+28h (411258h) } va_end( marker ); 00411290 mov dword ptr [marker],0 return 0; 00411297 xor eax,eax 004112A9 mov esp,ebp 004112AB pop ebp 004112AC ret } int __fastcall test_fastcall(char para1, char para2, char para3, char para4) { 004112D0 push ebp 004112D1 mov ebp,esp 004112D3 sub esp,0D8h 004112DD lea edi,[ebp-0D8h] 004112E3 mov ecx,36h 004112E8 mov eax,0CCCCCCCCh 004112ED rep stos dword ptr es:[edi] 004112EF pop ecx 004112F0 mov byte ptr [ebp-14h],dl 004112F3 mov byte ptr [ebp-8],cl para1 = (char)1; 004112F6 mov byte ptr [para1],1 para2 = (char)2; 004112FA mov byte ptr [para2],2 para3 = (char)3; 004112FE mov byte ptr [para3],3 para4 = (char)4; 00411302 mov byte ptr [para4],4 return 0; 00411306 xor eax,eax 0041130B mov esp,ebp 0041130D pop ebp 0041130E ret 8 ;由于使用了ecx ,edx 传递参数 本来4个参数只使用两push 所以这里是 ret 4*2 } __declspec(naked) void __stdcall test_naked(char para1, char para2) { 00411330 push ebp ;这里编译器没加入任何初始化和清栈的指令,你代码如何写它就复制过来 00411331 mov ebp,esp 00411333 push eax 00411334 mov al,byte ptr [para2] 00411337 xchg al,byte ptr [para1] 0041133A pop eax 0041133B pop ebp 0041133C ret 8 }
http://securityetalii.es/2013/01/20/calling-conventions-hunting/
Calling Conventions Hunting
When trying to understand a binary, it’s key to be able to identify functions, and with them, their parameters and local variables. This will help the reverser figuring out APIs, data structures, etc. In short, gaining a deep understanding of the software. When dealing with functions, it’s essential to be able to identify the calling convention in use, as many times that will allow the reverser to perform educated guesses on the arguments and local variables used by the function. I’ll try to describe here a couple of points that may aid in identifying the calling convention of any given function and the number and ordering of its parameters.
Calling Conventions
A calling convention defines how functions are called in a program. They influence how data (arguments/variables) is laid on the stack when the function call takes place. A comprehensive definition of calling conventions is beyond the scope of this blog, nonetheless the most common ones are briefly described below.
cdecl
Description: Standard C/C++ calling convention. Allows functions to receive a dynamic number of parameters.
Cleans the stack: The caller is responsible for restoring the stack after making a function call.
Arguments passed: On the stack. Arguments are received in reverse order (i.e. from right to left). This is because the first argument is pushed onto the stack first, and the last is pushed last.
void _cdecl fun();
fastcall
Description: Slightly better performance calling convention.
Cleans the stack: The callee is responsible for restoring the stack before returning.
Arguments passed: First two arguments are passed in registers (ECX and EDX). The rest are passed through the stack.
void __fastcall func();
stdcall
Description: Very common in Windows (used by most APIs).
Cleans the stack: The callee is responsible for cleaning up the stack before returning. Usually by means of a RETN #N instruction.
Arguments passed: On the stack. Arguments received from left to right (opposite to cdecl). First argument is pushed last.
void __stdcall fun();
thiscall
Description: Used when C++ method with a static number of parameters is called. Specially thought to improve performance of OO languages (saves EDX for the this pointer with VC++. GCC pushes the this pointer onto the stack last). When a dynamic number of parameters is required, compilers usually fall back to cdecl and pass the this pointer as the first parameter on the stack.
Cleans the stack: In GCC, caller cleans the stack. In Microsoft VC++ the callee is responsible for cleaning up.
Arguments passed: From right to left (as cdecl). First argument is pushed first, and last argument is pushed last.
void __thiscall func();
Let the small table below serve as a quick reminder.
http://www.cs.virginia.edu/~evans/cs216/guides/x86.html
Calling Convention
To allow separate programmers to share code and develop libraries for use by many programs,
and to simplify the use of subroutines in general, programmers typically adopt a common calling convention.
The calling convention is a protocol about how to call and return from routines.
For example, given a set of calling convention rules, a programmer need not examine the definition of a subroutine to determine
how parameters should be passed to that subroutine.
Furthermore, given a set of calling convention rules, high-level language compilers can be made to follow the rules,
thus allowing hand-coded assembly language routines and high-level language routines to call one another.
In practice, many calling conventions are possible.
We will use the widely used C language calling convention.
Following this convention will allow you to write assembly language subroutines that are safely callable from C (and C++) code,
and will also enable you to call C library functions from your assembly language code.
The C calling convention is based heavily on the use of the hardware-supported stack.
It is based on the push, pop, call, and ret instructions.
Subroutine parameters are passed on the stack.
Registers are saved on the stack, and local variables used by subroutines are placed in memory on the stack.
The vast majority of high-level procedural languages implemented on most processors have used similar calling conventions.
The calling convention is broken into two sets of rules.
The first set of rules is employed by the caller of the subroutine, and the second set of rules is observed by the writer of the subroutine (the callee).
It should be emphasized that mistakes in the observance of these rules quickly result in fatal program errors
since the stack will be left in an inconsistent state; thus meticulous care should be used when implementing the call convention in your own subroutines.
A good way to visualize the operation of the calling convention is to draw the contents of the nearby region of the stack during subroutine execution. The image above depicts the contents of the stack during the execution of a subroutine with three parameters and three local variables. The cells depicted in the stack are 32-bit wide memory locations, thus the memory addresses of the cells are 4 bytes apart. The first parameter resides at an offset of 8 bytes from the base pointer. Above the parameters on the stack (and below the base pointer), the call instruction placed the return address, thus leading to an extra 4 bytes of offset from the base pointer to the first parameter. When the ret instruction is used to return from the subroutine, it will jump to the return address stored on the stack.
Caller Rules
To make a subrouting call, the caller should:
- Before calling a subroutine, the caller should save the contents of certain registers that are designated caller-saved. The caller-saved registers are EAX, ECX, EDX. Since the called subroutine is allowed to modify these registers, if the caller relies on their values after the subroutine returns, the caller must push the values in these registers onto the stack (so they can be restore after the subroutine returns.
- To pass parameters to the subroutine, push them onto the stack before the call. The parameters should be pushed in inverted order (i.e. last parameter first). Since the stack grows down, the first parameter will be stored at the lowest address (this inversion of parameters was historically used to allow functions to be passed a variable number of parameters).
- To call the subroutine, use the call instruction. This instruction places the return address on top of the parameters on the stack, and branches to the subroutine code. This invokes the subroutine, which should follow the callee rules below.
After the subroutine returns (immediately following the call instruction), the caller can expect to find the return value of the subroutine in the register EAX. To restore the machine state, the caller should:
- Remove the parameters from stack. This restores the stack to its state before the call was performed.
- Restore the contents of caller-saved registers (EAX, ECX, EDX) by popping them off of the stack. The caller can assume that no other registers were modified by the subroutine.
Example
The code below shows a function call that follows the caller rules. The caller is calling a function _myFunc that takes three integer parameters. First parameter is in EAX, the second parameter is the constant 216; the third parameter is in memory location var.
push [var] ; Push last parameter first push 216 ; Push the second parameter push eax ; Push first parameter last call _myFunc ; Call the function (assume C naming) add esp, 12
Note that after the call returns, the caller cleans up the stack using the add instruction.
We have 12 bytes (3 parameters * 4 bytes each) on the stack, and the stack grows down.
Thus, to get rid of the parameters, we can simply add 12 to the stack pointer.
The result produced by _myFunc is now available for use in the register EAX.
The values of the caller-saved registers (ECX and EDX), may have been changed.
If the caller uses them after the call, it would have needed to save them on the stack before the call and restore them after it.
Callee Rules
The definition of the subroutine should adhere to the following rules at the beginning of the subroutine:
- Push the value of EBP onto the stack, and then copy the value of ESP into EBP using the following instructions:
push ebp mov ebp, esp
When a subroutine is executing, the base pointer holds a copy of the stack pointer value from when the subroutine started executing.
Parameters and local variables will always be located at known, constant offsets away from the base pointer value.
We push the old base pointer value at the beginning of the subroutine so that we can later restore the appropriate base pointer value for the caller when the subroutine returns.
Remember, the caller is not expecting the subroutine to change the value of the base pointer.
We then move the stack pointer into EBP to obtain our point of reference for accessing parameters and local variables. - Next, allocate local variables by making space on the stack. Recall, the stack grows down, so to make space on the top of the stack, the stack pointer should be decremented.
The amount by which the stack pointer is decremented depends on the number and size of local variables needed.
For example, if 3 local integers (4 bytes each) were required, the stack pointer would need to be decremented by 12 to make space for these local variables
(i.e., sub esp, 12). As with parameters, local variables will be located at known offsets from the base pointer.
- Next, save the values of the callee-saved registers that will be used by the function must be saved.
To save registers, push them onto the stack. The callee-saved registers are EBX, EDI, and ESI
(ESP and EBP will also be preserved by the calling convention, but need not be pushed on the stack during this step).
After these three actions are performed, the body of the subroutine may proceed. When the subroutine is returns, it must follow these steps:
- Leave the return value in EAX.
- Restore the old values of any callee-saved registers (EDI and ESI) that were modified.
The register contents are restored by popping them from the stack. The registers should be popped in the inverse order that they were pushed. - Deallocate local variables.
The obvious way to do this might be to add the appropriate value to the stack pointer (since the space was allocated by subtracting the needed amount from the stack pointer).
In practice, a less error-prone way to deallocate the variables is to move the value in the base pointer into the stack pointer:
mov esp, ebp.
This works because the base pointer always contains the value that the stack pointer contained immediately prior to the allocation of the local variables. - Immediately before returning, restore the caller's base pointer value by popping EBP off the stack.
Recall that the first thing we did on entry to the subroutine was to push the base pointer to save its old value. - Finally, return to the caller by executing a ret instruction. This instruction will find and remove the appropriate return address from the stack.
Note that the callee's rules fall cleanly into two halves that are basically mirror images of one another.
The first half of the rules apply to the beginning of the function, and are commonly said to define the prologue to the function.
The latter half of the rules apply to the end of the function, and are thus commonly said to define the epilogue of the function.
Example
Here is an example function definition that follows the callee rules:
.486 .MODEL FLAT .CODE PUBLIC _myFunc _myFunc PROC ; Subroutine Prologue push ebp ; Save the old base pointer value. mov ebp, esp ; Set the new base pointer value. sub esp, 4 ; Make room for one 4-byte local variable. push edi ; Save the values of registers that the function push esi ; will modify. This function uses EDI and ESI. ; (no need to save EBX, EBP, or ESP) ; Subroutine Body mov eax, [ebp+8] ; Move value of parameter 1 into EAX mov esi, [ebp+12] ; Move value of parameter 2 into ESI mov edi, [ebp+16] ; Move value of parameter 3 into EDI mov [ebp-4], edi ; Move EDI into the local variable add [ebp-4], esi ; Add ESI into the local variable add eax, [ebp-4] ; Add the contents of the local variable ; into EAX (final result) ; Subroutine Epilogue pop esi ; Recover register values pop edi mov esp, ebp ; Deallocate local variables pop ebp ; Restore the caller's base pointer value ret _myFunc ENDP END
The subroutine prologue performs the standard actions of saving a snapshot of the stack pointer in EBP (the base pointer),
allocating local variables by decrementing the stack pointer, and saving register values on the stack.
In the body of the subroutine we can see the use of the base pointer.
Both parameters and local variables are located at constant offsets from the base pointer for the duration of the subroutines execution.
In particular, we notice that since parameters were placed onto the stack before the subroutine was called, they are always located below the base pointer (i.e. at higher addresses) on the stack.
The first parameter to the subroutine can always be found at memory location [EBP+8], the second at [EBP+12], the third at [EBP+16].
Similarly, since local variables are allocated after the base pointer is set, they always reside above the base pointer (i.e. at lower addresses) on the stack.
In particular, the first local variable is always located at [EBP-4], the second at [EBP-8], and so on.
This conventional use of the base pointer allows us to quickly identify the use of local variables and parameters within a function body.
The function epilogue is basically a mirror image of the function prologue.
The caller's register values are recovered from the stack, the local variables are deallocated by resetting the stack pointer,
the caller's base pointer value is recovered, and the ret instruction is used to return to the appropriate code location in the caller.