The Art of Picking Intel Registers Intel寄存器的艺术

 

https://www.swansontec.com/sregisters.html

I wrote this article for an online magazine called Scene Zine. Scene Zine caters to the Demo Scene, which is an digital art community dedicated to pushing the limits of computers through a mix of music, art, and computer programming. A particular category of demoscene productions, 4K intros, focus on the final production's raw file size. The goal is to put as much high-quality music, graphics, and animation as possible into only 4096 bytes. Doing this requires highly-specialized size optimization techniques, since 4096 bytes is less space than two pages of typed text or a true-color Windows XP icon. This article discusses some of these techniques.

Some people have commented that they want to see more expert programming articles in Scene Zine. To remedy the situation, this article is for all assembly language programmers out there. It discusses the fine art of picking which registers to use in your code. This information should simplify your coding and help you write smaller routines.

When the engineers at Intel designed the original 8086 processor, they had a special purpose in mind for each register. As they designed the instruction set, they created many optimizations and special instructions based on the function they expected each register to perform. Using registers according to Intel's original plan allows the code to take full advantage of these optimizations. Unfortunately, this seems to be a lost art. Few coders are aware of Intel's overall design, and most compilers are too the simplistic or focused on execution speed to use the registers properly. Understanding how the registers and instruction set fit together, however, is an important step on the road to effortless size-coding.

Using the registers consistently has other advantages besides size optimization. Like using good variable names, using consistent registers makes code more readable. When they are used properly, the registers have meanings almost as clear as the loop counter, i, in higher-level languages. In fact, I occasionally name my variables in C after x86 registers because the register names are so descriptive. With proper register use, x86 assembler can be almost as self-documenting as a high-level language.

Another benefit that consistent register use brings is better compression. In productions which use a compressor to pack the final build, such as 4K intros, creating more redundant code leads to smaller packed sizes. When code uses registers consistently, the same instruction sequences begin to appear over and over. This, in turn, improves the compression ratio.

As a review, all x86-family CPU's have 8 general-purpose registers. The registers are 32 bits wide, although 16-bit versions are also accessible with a special one-byte instruction prefix. In 16-bit mode, the situation is reversed. The lower 16 bits are accessible by default, and the full registers are accessible only with a prefix byte.

Each register name is really an acronym. This is true even for the "alphabetical" registers EAX, EBX, ECX, and EDX. The following list shows the register names and their meanings:

  • EAX - Accumulator Register
  • EBX - Base Register
  • ECX - Counter Register
  • EDX - Data Register
  • ESI - Source Index
  • EDI - Destination Index
  • EBP - Base Pointer
  • ESP - Stack Pointer

In addition to the full-sized general registers, the x86 processor also has eight byte-sized registers. Since these registers map directly into EAX, EBX, ECX, and EDX, most people view them as parts of the larger registers. From the instruction set point of view, however, the 8-bit registers are separate entities. For example, the CL and CH registers share none of the ECX register's useful properties. Except for AL and AH, none of the 8-bit registers have any special significance in the instruction set, so this article does not mention them.

EAX: The Accumulator

There are three major processor architectures: register, stack, and accumulator. In a register architecture, operations such as addition or subtraction can occur between any two arbitrary registers. In a stack architecture, operations occur between the top of the stack and other items on the stack. In an accumulator architecture, the processor has single calculation register called the accumulator. All calculations occur in the accumulator, and the other registers act as simple data storage locations.

Obviously, the x86 processor does not have an accumulator architecture. It does, however, have an accumulator-like register: EAX / AL. Although most calculations can occur between any two registers, the instruction set gives the accumulator special preference as a calculation register. For example, all nine basic operations (ADD, ADC, AND, CMP, OR, SBB, SUB, TEST, and XOR) have special one-byte opcodes for operations between the accumulator and a constant. Specialized operations, such as multiplication, division, sign extension, and BCD correction can only occur in the accumulator.

Since most calculations occur in the accumulator, the x86 architecture contains many optimized instructions for moving data in and out of this register. To start, the processor has sixteen byte-sized XCHG opcodes for swapping data between the accumulator and any other register. These aren't terribly useful, but they show how strongly the Intel engineers preferred the accumulator over the other registers. For them, it was better to swap data into the accumulator to than to work with it where it was. Other instructions that move data in and out of the accumulator are LODS, STOS, IN, OUT, INS, OUTS, SCAS, and XLAT. Finally, the MOV instruction has a special one-byte opcode for moving data into the accumulator from a constant memory location.

In your code, try to perform as much work in the accumulator as possible. As you will see, the remaining seven general-purpose registers exist primarily to support the calculation occurring in the accumulator.

EDX: The Data Register

Of the seven remaining general-purpose registers, the data register, EDX, is most closely tied to the accumulator. Instructions that deal with over sized data items, such as multiplication, division, CWD, and CDQ, store the most significant bits in the data register and the least significant bits in the accumulator. In a sense, the data register is the 64-bit extension of the accumulator. The data register also plays a part in IO instructions. In this case, the accumulator holds the data to read or write from the port, and the data register holds the port address.

In your code, the data register is most useful for storing data related to the accumulator's calculation. In my experience, most calculations need only these two registers for storage if they are written properly.

ECX: The Count Register

The count register, ECX, is the x86 equivalent of the ubiquitous variable i. Every counting-related instruction in the x86 uses ECX. The most obvious counting instructions are LOOP, LOOPZ, and LOOPNZ. Another counter-based instruction is JCXZ, which, as the name implies, jumps when the counter is 0. The count register also appears in some bit-shift operations, where it holds the number of shifts to perform. Finally, the count register controls the string instructions through the REP, REPE, and REPNE prefixes. In this case, the count register determines the maximum number of times the operation will repeat.

Particularly in demos, most calculations occur in a loop. In these situations, ECX is the logical choice for the loop counter, since no other register has so many branching operations built around it. The only problem is that this register counts downward instead of up as in high level languages. Designing a downward-counting is not hard, however, so this is only a minor difficulty.

EDI: The Destination Index

Every loop that generates data must store the result in memory, and doing so requires a moving pointer. The destination index, EDI, is that pointer. The destination index holds the implied write address of all string operations. The most useful string instruction, remarkably enough, is the seldom-used STOS. STOS copies data from the accumulator into memory and increments the destination index. This one-byte instruction is perfect, since the final result of any calculation should be in the accumulator anyhow, and storing results in a moving memory address is a common task.

Many coders treat the destination index as no more than extra storage space. This is a mistake. All routines must store data, and some register must serve as the storage pointer. Since the destination index is designed for this job, using it for extra storage is a waste. Use the stack or some other register for storage, and use EDI as your global write pointer.

ESI: The Source Index

The source index, ESI, has the same properties as the destination index. The only difference is that the source index is for reading instead of writing. Although all data-processing routines write, not all read, so the source index is not as universally useful. When the time comes to use it, however, the source index is just as powerful as the destination index, and has the same type of instructions.

In situations where your code does not read any sort of data, of course, using the source index for convenient storage space is acceptable.

ESP and EBP: The Stack Pointer and the Base Pointer

Of the eight general purpose registers, only the stack pointer, ESP, and the base pointer, EBP, are widely used for their original purpose. These two registers are the heart of the x86 function-call mechanism. When a block of code calls a function, it pushes the parameters and the return address on the stack. Once inside, function sets the base pointer equal to the stack pointer and then places its own internal variables on the stack. From that point on, the function refers to its parameters and variables relative to the base pointer rather than the stack pointer. Why not the stack pointer? For some reason, the stack pointer lousy addressing modes. In 16-bit mode, it cannot be a square-bracket memory offset at all. In 32-bit mode, it can be appear in square brackets only by adding an expensive SIB byte to the opcode.

In your code, there is never a reason to use the stack pointer for anything other than the stack. The base pointer, however, is up for grabs. If your routines pass parameters by register instead of by stack (they should), there is no reason to copy the stack pointer into the base pointer. The base pointer becomes a free register for whatever you need.

EBX: The Base Register

In 16-bit mode, the base register, EBX, acts as a general-purpose pointer. Besides the specialized ESI, EDI, and EBP registers, it is the only general-purpose register that can appear in a square-bracket memory access (For example, MOV [BX], AX). In the 32-bit world, however, any register may serve as a memory offset, so the base register is no longer special.

The base register gets its name from the XLAT instruction. XLAT looks up a value in a table using AL as the index and EBX as the base. XLAT is equivalent to MOV AL, [BX+AL], which is sometimes useful if you need to replace one 8-bit value with another from a table (Think of color look-up).

So, of all the general-purpose registers, EBX is the only register without an important dedicated purpose. It is a good place to store an extra pointer or calculation step, but not much more.

Conclusion

The eight general-purpose registers in the x86 processor family each have a unique purpose. Each register has special instructions and opcodes which make fulfilling this purpose more convenient or efficient. The registers and their uses are shown briefly below:

  • EAX - All major calculations take place in EAX, making it similar to a dedicated accumulator register.
  • EDX - The data register is the an extension to the accumulator. It is most useful for storing data related to the accumulator's current calculation.
  • ECX - Like the variable i in high-level languages, the count register is the universal loop counter.
  • EDI - Every loop must store its result somewhere, and the destination index points to that place. With a single-byte STOS instruction to write data out of the accumulator, this register makes data operations much more size-efficient.
  • ESI - In loops that process data, the source index holds the location of the input data stream. Like the destination index, EDI has a convenient one-byte instruction for loading data out of memory into the accumulator.
  • ESP - ESP is the sacred stack pointer. With the important PUSH, POP, CALL, and RET instructions requiring it's value, there is never a good reason to use the stack pointer for anything else.
  • EBP - In functions that store parameters or variables on the stack, the base pointer holds the location of the current stack frame. In other situations, however, EBP is a free data-storage register.
  • EBX - In 16-bit mode, the base register was useful as a pointer. Now it is completely free for extra storage space.

As an example of how these registers fit together, here is an outline of a typical routine:

                mov     esi, source_address
                mov     edi, destination_address
                mov     ecx, loop_count
my_loop:        lodsd

                ;Do some calculations with eax here.

                stosd
                loop    my_loop

In this example, ECX is the loop counter, ESI points to the input data, and EDI points to the output data. Some calculations, such as a blur, filter, or perhaps a color look-up occur in the loop using EAX as a variable. This example is a bit simplistic, but hopefully it shows the general idea. A real routine would probably deal with much more complicated data than DWORD's, and would probably involve a bunch of floating-point as well.

In conclusion, using the registers as Intel intended has several advantages. In the fist case, it allows your code to take advantage of many optimizations and special instructions. It also makes the code more readable, since registers perform predictable functions. Finally, using the registers consistently leads to better compression by promoting more repetitive instruction sequences.

Spanish Translation

Ukrainian Translation

Polish Translation

  我为一本名为Scene Zine的在线杂志写了这篇文章。 Scene Zine 致力于 Demo Scene,它是一个数字艺术社区,致力于通过音乐,艺术和计算机编程的混合推动计算机的极限发展。 一个特殊类别的 demo scene 制作,大概是 4K,专注于最终制作的原始文件大小。 目标是尽可能将高质量的音乐,图形和动画放入只有4096个字节中。 这样做需要高度专业化的大小优化技术,因为4096字节的空间少于两页输入文本或真彩色Windows XP图标。 本文将讨论了其中的一些技巧。

有些人评论说他们希望在 Scene Zine 中看到更多的专业编程文章。 为了纠正这种情况,本文适用于所有汇编语言程序员。 它讨论了你的代码中使用哪些寄存器的技巧。 这些信息会简化你的编码,并帮助你编写更小的程序。

 

溪边九节
溪边九节
翻译于 2018/04/24 18:56
 

当英特尔的工程师设计了最初的8086处理器时,他们对每个寄存器都有一个特殊的目的。当他们设计指令集时,他们期望根据每个寄存器执行的功能创建了许多优化和特殊的指令。根据Intel最初的计划使用寄存器可以使代码充分利用这些优化。不幸的是,这似乎是一门失传的艺术。很少有程序员知道英特尔的总体设计,大多数编译器都过于简单,或者专注于执行速度来正确使用寄存器。然而,理解寄存器和指令集如何组合在一起,是通向轻松编码道路上的一个重要步骤。
除了尺寸优化外,使用寄存器还具有其他优点。就像使用好的变量名一样,使用一致的寄存器使代码更具可读性。当它们被正确使用时,寄存器的含义几乎和循环计数器一样清晰,我用的是高级语言。事实上,在x86寄存器之后,我偶尔会用C来命名我的变量,因为寄存器是如此的描述性。有了适当的注册使用,x86汇编程序几乎可以作为一种高级语言进行自我记录。

liyue李月
liyue李月
翻译于 2018/04/25 15:09
 

一致的寄存器使用带来的另一个好处是更好的压缩比。在使用压缩器打包最终版本的产品中,例如4K引擎,创建更多冗余代码可导致更小的打包尺寸。当代码一致地使用寄存器时,相同的指令序列开始反复出现。这反过来又提高了压缩比。

总之,所有x86系列CPU都有8个通用寄存器。寄存器的位宽是32位,但16位版本也可通过特殊的单字节指令前缀进行访问。在16位模式下,情况是相反的。低16位默认是可访问的,全部寄存器只能用前缀字节访问。

Tocy
Tocy
翻译于 2018/04/24 14:42
 
每个注册寄存器的名称实际上都是专有缩写。“字母”寄存器EAX,EBX,ECX和EDX也是如此。 以下列表显示了寄存器名称及其含义:

  • EAX - 累加器寄存器
  • EBX - 基础寄存器
  • ECX - 计数器寄存器
  • EDX - 数据寄存器
  • ESI - 源指针
  • EDI - 目的地指针
  • EBP - 基本指针
  • ESP - 堆栈指针

除了通用寄存器外,x86处理器还具有8个字节大小的寄存器。 由于这些寄存器直接映射到EAX,EBX,ECX和EDX,它们是较大寄存器的一部分。 但是,从指令集的角度来看,8位寄存器是独立的实体。 例如,CL和CH寄存器不共享ECX寄存器的属性。 除AL和AH外,8位寄存器在指令集中没有任何特殊含义,因此本文未提及它们。
周其
周其
翻译于 2018/04/26 10:55
 

EAX: 累加器

有三个主要的处理器架构:寄存器、堆栈和累加器。在寄存器体系结构中,在任意两个任意寄存器之间可以发生加法或减法之类的操作。在堆栈体系结构中,操作发生在堆栈顶部和堆栈上的其他项目之间。在一个累加器架构中,处理器有一个称为累加器的计算寄存器。所有的计算都发生在累加器中,而其他寄存器则充当简单的数据存储位置。

显然,x86处理器没有一个累加器架构。然而,它确实有一个类似于累加器的寄存器:EAX/AL.尽管大多数计算都可能发生在任意两个寄存器之间,但是指令集给了累加器特殊的首选项作为计算寄存器。例如,所有9个基本操作(ADD、ADC、CMP、SBB、SUB、TEST和XOR)都有特殊的单字节操作码,用于在累加器和常量之间进行操作。专门的操作,如乘法、除法、符号扩展和BCD修正只能在累加器中进行。

 
翻译于 2018/04/24 20:49
 

由于大多数的计算都发生在累加器中,因此x86架构包含许多用于将数据移入和移出此寄存器的优化指令。首先,处理器具有十六个字节大小的XCHG伪码,用于在累加器和任何其他寄存器之间交换数据。这些并不是非常有用,但它们表明了英特尔工程师多么期望将累加器优先于其他寄存器。对他们来说,最好将数据交换到累加器中,而不是将数据放到原来的位置做处理。将数据移入和移出累加器的其他指令有LODS,STOS,IN,OUT,INS,OUTS,SCAS和XLAT。最后,MOV指令具有一个特殊的单字节伪码,用于将数据从连续内存位置移入累加器中。

在你的代码中,尽可能在累加器中执行尽可能多的工作。如你所见,其余七个通用寄存器的存在主要是用于支持发生在累加器中的计算的。

Tocy
Tocy
翻译于 2018/04/24 15:02
 

EDX: 数据寄存器

在其余七个通用寄存器中,数据寄存器EDX与累加器是最为紧密地联系在一起的。处理超大数据项的指令(如乘法,除法,CWD和CDQ)将最高有效位存储在数据寄存器中,并将最低有效位存储在累加器中。从某种意义上说,数据寄存器是累加器的64位扩展。数据寄存器也在IO指令中起一定作用。在这种情况下,累加器保存要从端口读取或写入的数据,数据寄存器保存端口地址。

在你的代码中,数据寄存器在存储与累加器计算相关的数据时是非常有用的。根据我的经验,如果编写正确,大多数计算仅需要这两个寄存器进行存储。

Tocy
Tocy
翻译于 2018/04/24 14:51
 

ECX: 计数寄存器

计数寄存器,ECX,在x86中等同的无所不在的变量i,在x86上的每一个与计数相关的指令都使用ECX。最明显的计数指令是循环、LOOPZ和LOOPNZ。另一个基于计数的指令是JCXZ,正如其名,当计数器为0时,它就会跳转。计数寄存器也会出现在一些位移操作中,其中包含要执行的位移的次数。最后,计数寄存器通过REP、REPE和REPNE前缀控制字符串指令。在这种情况下,计数寄存器决定了操作重复的最大次数。

特别是在演示中,大多数计算都发生在循环中。在这些情况下,ECX是循环计数器的合理选择,因为没有其他寄存器在它周围有这么多分支操作。唯一的问题是,这个寄存器是向下递减计数的,而不是像高级语言那样。设计一个向下的计数并不困难,所以这只是一个小问题。

 
翻译于 2018/04/24 21:11
 

EDI: 目的变址寄存器

每个产生数据的循环都必须将结果存储在内存中,这样做需要一个可移动的指针。目的变址寄存器EDI就是该指针。目的变址寄存器中保存所有字符串操作的默认写入的地址。非常有用的字符串指令,很显然,是很少使用的STOS。STOS将来自累加器的数据复制到内存中并自增目的变址寄存器。这个单字节指令是完美的,因为任何计算的最终结果都应该在累加器中,并且将结果存储在滑动的内存地址中是一项常见任务。

许多编码人员将目的变址寄存器视为不过是额外的存储空间。这是个错误。所有例程都必须存储数据,而某些寄存器必须作为其存储指针。由于目的变址寄存器是为这项工作而设计的,因此将其用于额外存储是一种浪费。使用堆或其他寄存器进行存储,并使用EDI作为全局写入指针。

Tocy
Tocy
翻译于 2018/04/24 16:36
 

ESI: 源变址寄存器

源地址寄存器ESI,与目标变址寄存器具有相同的属性。唯一的区别是源变址寄存器是用于读取而不是写入的。尽管所有的数据处理例程都是写入的,但并不是所有都需要读取,所以源变址寄存器并不普遍有用。但是,当需要使用它时,源变址寄存器与目标变址寄存器一样强大,并且具有相同类型的指令。

在你的代码没有读取任何类型数据的情况下,当然可以使用源变址寄存器来实现便捷的存储空间是可以的。

Tocy
Tocy
翻译于 2018/04/24 14:55
 

SP和EBP: 栈指针寄存器和基址指针寄存器

在八个通用寄存器中,只有栈指针寄存器ESP和基址指针寄存器EBP被按其原始目标而广泛使用。这两个寄存器是x86函数调用机制的核心。当一段代码调用了一个函数时,它会推送栈中的参数和返回地址。一旦进入该函数,它将基址指针设置为等于栈指针,然后将其自身的内部变量放在栈上。从这一点开始,函数将使用基址指针寄存器而不是栈指针寄存器来引用它的参数和变量。为什么不使用栈指针呢?由于某些原因,栈指针寄存器的糟糕的寻址模式。在16位模式下,它根本不可能是方括号中的内存偏移量。在32位模式下,仅需在伪码中添加一个昂贵的SIB字节,它就可以出现在方括号中。

在你的代码中,除了栈之外,没有任何理由使用栈指针寄存器。但基址指针寄存器可用于抓取。如果你的例程通过寄存器而不是栈来传递参数(它们应该使用栈),那么没有理由将栈指针寄存器复制到基址指针寄存器中。无论你需要什么,基址指针寄存器都将成为一个自由使用的寄存器。

Tocy
Tocy
翻译于 2018/04/25 11:20
 

EBX: 基址寄存器

在16位模式下,基址寄存器EBX充当通用指针使用。除了专用的ESI、EDI和EBP寄存器之外,它是唯一可以出现在方括号内存访问符(例如MOV [BX],AX)中的通用寄存器。然而,在32位的情形中,任何寄存器都可以用作内存的偏移量,所以基址寄存器不再是专用的。

基址寄存器从XLAT指令中获取其名称。XLAT使用以AL作为索引、EBX作为基址在表中查找值。XLAT等价于MOV AL,[BX + AL],如果你需要用表中的另一个8位值来替换一个8位值(想想有关颜色查询的操作),则该操作有时是很有用的。

因此,在所有通用寄存器中,EBX是唯一没有重要专用目的的寄存器。这是一个存储额外指针或计算步骤的好地方,但仅此而已。

Tocy
Tocy
翻译于 2018/04/25 09:32
 

结论

x86处理器系列中的八个通用寄存器各有一个独特的用途。每个寄存器都有特殊的指令和操作码,可以更加方便或高效的达到目的。下面简要显示寄存器及其用途:

  • EAX -所有主要的计算都在EAX中进行,与专用的累加器寄存器相似。

  • EDX -数据寄存器是累加器的扩展。这对于存储累加器当前计算的数据非常有用。

  • ECX -就像高级语言中的变量i一样,数字寄存器是通用循环计数器

  • EDI -每个循环都必须将结果存储在某处,并将最终的索引指向这个地方。利用单字节STOS指令将数据写出累加器,寄存器使得数据操作更加高效。

  • ESI -在处理数据的循环中,源索引保存输入数据流的位置。与目标索引一样,EDI有方便的单字节指令,用于从内存中加载数据到累加器中。

  • ESP - ESP是神圣的堆栈指针。重要的PUSH、POP、CALL和RET指令需要它的值,因此从头开始使用堆栈指针是没有好处的。

  • EBP - 在堆栈存储参数或变量的函数中,基地址指针保存当前堆栈帧的位置。然而,在其他情况下,EBP是一个空的数据存储寄存器。

  • EBX - 在16位模式下,基地址存储器可以当做指针。现在,它是完全空置的额外存储空间。

kevinlinkai
kevinlinkai
翻译于 2018/04/25 11:35
 
这些寄存器是如何组合在一起的,下面是一个典型的例子:   
 mov     esi, source_address
                mov     edi, destination_address
                mov     ecx, loop_countmy_loop:        lodsd

                ;Do some calculations with eax here.

                stosd
                loop    my_loop

在这个例子中,ECX是循环计数器,ESI指向输入数据,EDI指向输出数据。这个例子可能有点简单,但是它能体现出总体的思路。一个真正的例子可能要处理比DWORD更复杂的数据,还有可能设计一点浮点数。

但是,使用寄存器有以下几个优点。在第一种情况下,它可以优化代码并且允许有特殊说明。因为寄存器执行的可预测,它能使得代码更具可读性。最后,使用这些寄存器可以通过促进更多重复的指令序列来持续地提高压缩率。 

 
 
 

 

posted @ 2019-11-14 09:27  papering  阅读(262)  评论(0编辑  收藏  举报