x86-64 processor (aka amd64, x64) x86, IA-32 i386 Intel 386， AArch64 processor (aka arm64) ，AMD64

小结：

1、

The new 386 design, available at 12.5 MHz and 16 MHz clock speeds, allowed the chip to process information more than twice as fast as the 286, and spawned a new revolution in software design.

2、

【80386 x86 】
Windows NT stopped supporting the Intel 80386 processor with Windows 4.0, which raised the minimum requirements to an Intel 80486. Therefore, the Intel 80386 technically falls into the category of “processor that Windows once supported but no longer does.” This series focuses on the portion of the x86 instruction set available on an 80386, although I will make notes about future extensions in a special chapter.

The Intel 80386 has eight integer registers, each 32 bits wide.

Register Meaning Preserved?
eax accumulator No
ebx base register Yes
ecx count register No
edx data register No
esi source index Yes
edi destination index Yes
ebp base pointer Yes
esp stack pointer Sort of

【 x86-64 processor amd64, x64】
I figure I’d tidy up the processor overview series by covering the last¹ processor on my list of “processors Windows has supported in its history,” namely, the x86-64. Other names for this architecture are amd64 (because AMD invented it) and x64 (which is super-confusing because it doesn’t correspond with x86, a common nickname for the x86-32).

This is going to be a quick overview because the x86-64 is a natural extension of the i386, which we covered some time ago. I’ll just highlight the differences.

Each existing 32-bit general-purpose register has been extended from 32 bits to 64. The name of the 64-bit register is based on the name of the 32-bit register, but with the leading e changed to a leading r. Eight new 64-bit registers were introduced, bring the total to 16. Instead of giving quirky names to the new registers, they are just numbered: r8 through r15. To match the existing classic registers, the new registers also have aliases for referring to partial registers, and partial register aliases were invented for some of the classic registers that lacked them.

Register Aliases Preserved? Notes
Bits 31:0 Bits 15:0 Bits 15:8 Bits 7:0
rax eax ax ah al No Return value
rbx ebx bx bh bl Yes
rcx ecx cx ch cl No Parameter 1
rdx edx dx dh dl No Parameter 2
rsi esi si sil Yes
rdi edi di dil Yes
rsp esp sp spl Yes Stack pointer
rbp ebp bp bpl Yes Frame pointer
r8 r8d r8w r8b No Parameter 3
r9 r9d r9w r9b No Parameter 4
r10 r10d r10w r10b No
r11 r11d r11w r11b No
r12 r12d r12w r12b Yes
r13 r13d r13w r13b Yes
r14 r14d r14w r14b Yes
r15 r15d r15w r15b Yes
The eip and eflags registers are correspondingly expanded to 64-bit registers rip and rflags.

【 AArch64 processor (aka arm64)】
The 64-bit version of the ARM architecture is formally known as AArch64. It is the 64-bit version of classic 32-bit ARM, which has been retroactively renamed AArch32.

Even though the architecture formally goes by the name AArch64, many people (including Windows) call it arm64. Even more confusing, the instruction set is called A64. (The 32-bit ARM instruction sets have also been retroactively renamed: Classic ARM is now called A32, and Thumb-2 is now called T32.)

AArch64 differs from AArch32 so much that I’m going to cover it fresh rather than treating it as an extension of AArch32. That said, I will nevertheless call out notable points of difference from AArch32.

【AMD64 backward-compatible extension of the industry-standard (legacy) x86 architecture】

AMD64 Architecture Programmer’s Manual, Volume 1: Application Programming https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/programmer-references/24592.pdf

The AMD64 architecture is a simple yet powerful 64-bit, backward-compatible extension of the industry-standard (legacy) x86 architecture. It adds 64-bit addressing and expands register resources to support higher performance for recompiled 64-bit programs, while supporting legacy 16-bit and 32- bit applications and operating systems without modification or recompilation. It is the architectural basis on which new processors can provide seamless, high-performance support for both the vast body of existing software and 64-bit software required for higher-performance applications.

The need for a 64-bit x86 architecture is driven by applications that address large amounts of virtual and physical memory, such as high-performance servers, database management systems, and CAD tools. These applications benefit from both 64-bit addresses and an increased number of registers. The small number of registers available in the legacy x86 architecture limits performance in computationintensive applications. Increasing the number of registers provides a performance boost to many such applications.

The AArch64 processor (aka arm64), part 1: Introduction - The Old New Thing https://devblogs.microsoft.com/oldnewthing/20220726-00/?p=106898

The AArch64 processor (aka arm64), part 1: Introduction

Raymond Chen

July 26th, 20224 7

The 64-bit version of the ARM architecture is formally known as AArch64. It is the 64-bit version of classic 32-bit ARM, which has been retroactively renamed AArch32.

No more Thumb mode

AArch64 is an extension of the classic ARM instruction set, not an extension of Thumb-2. So we’re back to fixed-size 32-bit instructions (aligned on 4-byte boundaries). No more gymnastics with low registers and high registers, or using non-intuitive instructions to avoid a 32-bit encoding, or remembering to set the bottom bit on code addresses to avoid accidentally switching into classic mode.

A note for those familiar with the classic ARM instruction set: One thing that did not get carried forward was arbitrary predication. The answers to this StackOverflow question dig into the reasons why predication was removed. Short version: Predication is rarely used, it consumes a lot of opcode space, it doesn’t interact well with out-of-order execution, and branch prediction is almost as good.

Data sizes

The architectural terms for data sizes are the same as AArch32.

Term	Size
byte	8 bits
halfword	16 bits
word	32 bits
doubleword	64 bits

The processor supports both big-endian and little-endian operation. Windows uses it exclusively in little-endian mode. AArch64 lost the Aarch32 SETEND instruction for switching endianness from user mode. Not that Windows supported it anyway.

Registers

Everything has doubled. The general-purpose registers are now 64 bits wide instead of 32. And the number of such registers has doubled from 16 to 32 okay just 31. The encoding that would correspond to register 31 has been reused for other purposes. So not quite doubled.

Register	Preserved?	Notes
`x0`	No	Parameter 1, return value
`x1`	No	Parameter 2
`x2`	No	Parameter 3
`x3`	No	Parameter 4
`x4`	No	Parameter 5
`x5`	No	Parameter 6
`x6`	No	Parameter 7
`x7`	No	Parameter 8
`x8`	No
`x9`	No
`x10`	No
`x11`	No
`x12`	No
`x13`	No
`x14`	No
`x15`	No
`x16` (`xip0`)	Volatile	Intra-procedure call scratch register
`x17` (`xip1`)	Volatile	Intra-procedure call scratch register
`x18` (`xpr`)	read-only	TEB
`x19`	Yes
`x20`	Yes
`x21`	Yes
`x22`	Yes
`x23`	Yes
`x24`	Yes
`x25`	Yes
`x26`	Yes
`x27`	Yes
`x28`	Yes
`x29` (`fp`)	Yes	frame pointer
`x30` (`lr`)	No	link register
register “31” usually represents `sp` or `zr`, depending on instruction

The link register is architectural; the rest are convention.

You can refer to the least significant 32 bits of each 64-bit register by changing the leading x to a w, so we have w0 through w30. If an instruction targets a w register, the result is zero-extended to fill the x register.¹

Particularly notable is that the stack pointer sp and program counter pc are no longer general-purpose registers, like they were in AArch32. The registers still exist, but they are treated as special registers rather than being encoded in the same way as the other general-purpose registers.

In AArch64, the pc special register reads as the address of the instruction being executed, rather than being four bytes ahead, as it was in AArch32. The extra +4 in AArch32 was an artifact of the internal pipelining of the original ARM and became a backward compatibility constraint even as the pipeline depth changed.

Windows requires that the stack remain 16-byte aligned, and it enables hardware enforcement of this requirement. The 32-bit subregister of sp is called wsp, although it is of no practical use. (The 64-bit register is still called sp, not xsp. Go figure.)

There is a 16-byte red zone below the stack pointer, but it’s reserved for code analysis. Intrusive profilers inject assembly language fragments into compiled code to update profiling information, and they need some space to store two registers so they can free up some registers to do their profiling work.

The xip0 and xip1 registers are volatile because they are used to assist with branch instructions that try to branch to an address that is out of range. We’ll see later that these registers are also used by function prologues and epilogues.

There is a new xzr pseudo-register (and its 32-bit alias wzr) which reads as zero, and writes are ignored. As I noted in the above table, if an instruction encodes a register number of 31, then a special behavior kicks in, typically by treating mythical register 31 as an alias for sp or zr. Generally speaking, when being used as a base address register, imaginary register 31 represents sp, but when used for arithmetic or as a destination register, it represents zr.²

In instruction descriptions, I will use these shorthands:

Shorthand	Meaning
`Xn`	Any `x#` register
`Xn/zr`	Any `x#` register or `xzr`
`Xn/sp`	Any `x#` register or `sp`
`Wn`	Any `w#` register
`Wn/zr`	Any `w#` register or `wzr`
`Wn/sp`	Any `w#` register or `wsp`
`Rn`	Any `x#` or `w#` register
`Rn/zr`	Any `x#` register, `w#` register, `xzr` or `wzr`

The floating point registers have been reorganized. They have doubled in size (to 128 bits) as well as in number, and the single-precision registers are no longer paired up.

Register	Preserved?	Notes
`v0`	No	Parameter 1, return value
`v1`	No	Parameter 2
`v2`	No	Parameter 3
`v3`	No	Parameter 4
`v4`	No	Parameter 5
`v5`	No	Parameter 6
`v6`	No	Parameter 7
`v7`	No	Parameter 8
`v8` through `v15`	Low 64 bits only	Upper 64 bits are not preserved
`v16` through `v31`	No

Each floating point register can be viewed in multiple ways. The partial registers are stored in the least significant bits of the full register.

Name	Meaning	Notes
`v#`	SIMD vector
`q#`	128-bit value	quad precision
`d#`	64-bit value	double precision
`s#`	32-bit value	single precision
`h#`	16-bit value	half precision
`b#`	8-bit value

The flags register is formally known as the Application Program Status Register (APSR). The flags available to user mode are the same as in AArch32:

Mnemonic	Meaning	Notes
N	Negative	Set if the result is negative
Z	Zero	Set if the result is zero
C	Carry	Multiple purposes
V	Overflow	Signed overflow
Q	Saturation	Accumulated overflow
GE[n]	Greater than or equal to	4 flags (SIMD)

The overflow flag records whether the most recent operation resulted in signed overflow. The saturation flag is used by multimedia instructions to accumulate whether any overflow occurred since it was last cleared. The GE flags record the result of SIMD operations. By convention, flags are not preserved across calls.

There are a number of AArch64 features that you are extremely unlikely to see in Windows code, such as tagged pointers, tagged memory, and pointer authentication, so I won’t cover them here. I also won’t cover floating point instructions or SIMD instructions.

Next time, we’ll look at some of the weird transformations that can be performed inside an instruction.

Additional references:

Code in ARM Assembly: Registers explained. An analogous series looking at AArch64 from the Apple point of view rather than Windows.
Writing ARM64 Code for Apple Platforms: The Apple ABI specification for AArch64.

¹ The Windows debugger isn’t quite sure which name to use for these registers. The disassembler calls the registers xip0, xip1, and xpr, but the expression evaluator doesn’t understand those names; you have to call them @x16, @x17, and @x18. On the other hand, the expression evaluator does understand @fp and @lr and refuses to acknowledge the existence of the names @x29 and @x30. Furthermore, the expression evaluator doesn’t understand any of the w aliases.

² AArch64’s register 31 is similar to PowerPC’s register 0, which changes meaning depending on the instruction. In PowerPC assembly, it was on you to keep track of which encodings treat register 0 as a value register, and which treat it as a zero register. At least AArch64 expresses the two cases differently: If an encoding uses pseudo-register 31 to mean sp, then you really must write sp. If you write xzr, you get an error.

PowerPC on the other hand would happily let you specify r0 even if the instruction treats it as zero. Which was one of the jokes from the short-lived parody twitter account that mocked PowerPC.

The x86-64 processor (aka amd64, x64): Whirlwind tour - The Old New Thing https://devblogs.microsoft.com/oldnewthing/20220831-00/?p=107077

The x86-64 processor (aka amd64, x64): Whirlwind tour

Raymond Chen

August 31st, 202221 2

I figure I’d tidy up the processor overview series by covering the last¹ processor on my list of “processors Windows has supported in its history,” namely, the x86-64. Other names for this architecture are amd64 (because AMD invented it) and x64 (which is super-confusing because it doesn’t correspond with x86, a common nickname for the x86-32).

This is going to be a quick overview because the x86-64 is a natural extension of the i386, which we covered some time ago. I’ll just highlight the differences.

Each existing 32-bit general-purpose register has been extended from 32 bits to 64. The name of the 64-bit register is based on the name of the 32-bit register, but with the leading e changed to a leading r. Eight new 64-bit registers were introduced, bring the total to 16. Instead of giving quirky names to the new registers, they are just numbered: r8 through r15. To match the existing classic registers, the new registers also have aliases for referring to partial registers, and partial register aliases were invented for some of the classic registers that lacked them.

Register	Aliases				Preserved?	Notes
Register	Bits 31:0	Bits 15:0	Bits 15:8	Bits 7:0	Preserved?	Notes
`rax`	`eax`	`ax`	`ah`	`al`	No	Return value
`rbx`	`ebx`	`bx`	`bh`	`bl`	Yes
`rcx`	`ecx`	`cx`	`ch`	`cl`	No	Parameter 1
`rdx`	`edx`	`dx`	`dh`	`dl`	No	Parameter 2 and upper 64 bits of return value
`rsi`	`esi`	`si`		`sil`	Yes
`rdi`	`edi`	`di`		`dil`	Yes
`rsp`	`esp`	`sp`		`spl`	Yes	Stack pointer
`rbp`	`ebp`	`bp`		`bpl`	Yes	Frame pointer
`r8`	`r8d`	`r8w`		`r8b`	No	Parameter 3
`r9`	`r9d`	`r9w`		`r9b`	No	Parameter 4
`r10`	`r10d`	`r10w`		`r10b`	No
`r11`	`r11d`	`r11w`		`r11b`	No
`r12`	`r12d`	`r12w`		`r12b`	Yes
`r13`	`r13d`	`r13w`		`r13b`	Yes
`r14`	`r14d`	`r14w`		`r14b`	Yes
`r15`	`r15d`	`r15w`		`r15b`	Yes

The eip and eflags registers are correspondingly expanded to 64-bit registers rip and rflags.

Additional restrictions have been imposed on the use of the ah, bh, ch, and dh registers. The details aren’t important for reading code, so I won’t bother digging into them.

Windows requires that the stack be 16-byte aligned at function call boundaries, and there is no red zone. Calling a function pushes the 8-byte return address onto the stack, so on entry to a function, the stack is misaligned. Functions typically realign the stack in their prologue.

The old 8087-based floating point registers are not used.² Instead, the SIMD XMM registers are used for floating point calculations. These registers are 128 bits wide and can be viewed as four single-precision floating point values or as two double-precision floating point values. When used to pass parameters or return floating point values, only the bottom lane is used.

Eight more XMM registers have been added, bringing the total to 16.

Register	Preserved?	Notes
`XMM0`	No	Parameter 1 and return value
`XMM1`	No	Parameter 2 and second return value
`XMM2`	No	Parameter 3
`XMM3`	No	Parameter 4
`XMM4`	No
`XMM5`	No
`XMM6`	Yes
`XMM7`	Yes
`XMM8`	Yes
`XMM9`	Yes
`XMM10`	Yes
`XMM11`	Yes
`XMM12`	Yes
`XMM13`	Yes
`XMM14`	Yes
`XMM15`	Yes

Calling convention

The calling convention is register-based for the first four parameters, with remaining parameters on the stack. In practice, the stack-based parameters are not push‘d, but rather the values are mov‘d into the preallocated stack space.

For register-based parameters, integer parameters go into the general-purpose registers and floating point parameters go into the floating point registers. When a register is used to hold a parameter, its counterpart register goes unused. For example, a function that takes an integer and a double will pass the integer in rcx and the double in xmm1.

There are always 4 × 8 = 32 bytes of home space for the register-based parameters, even if the function has fewer than four formal parameters. (If this bothers you, then you can reinterpret the home space as a 32-byte red zone that resides above the return address.)

Integer return values up to 64 bits go into rax If the return value is a 128-bit value, then the rdx register holds the upper 64 bits. Floating point return values are returned in xmm0.

The caller is responsible for cleaning the stack. In practice, the caller does not clean the stack after every call, but rather preallocates the stack space in the prologue, reuses the stack space for multiple calls, and then cleans it all up in the epilogue.

Exception handling is done by unwind tables, not by threading exception handlers through the stack at runtime.

Partial registers

When a 32-bit partial register is the destination of an operation, the upper 32 bits are set to zero. For example, consider

    add     eax, ecx

On the 32-bit 80386, this adds the value of ecx to eax and puts the result back into eax. On x86-64, this performs the same calculation, but since the destination is the 32-bit partial register eax, the operation also zeroes out the upper 32 bits of rax.

Another way of looking at this is that writes to 32-bit partial registers are zero-extended to 64-bit values.³

Note, however, that operations on 16-bit and 8-bit partial registers leave the unused bits unchanged.

Addressing modes

The 32-bit addressing modes carry over to 64-bit, with these exceptions:

Absolute addressing mode has been removed.
There is a new rip-relative addressing mode.

The offsets in the memory addressing modes are 32-bit signed values, for a reach of ±2GB.

The rip-relative addressing mode greatly reduces the number of fixups required to relocate a module. The enormous ±2GB reach means that any reasonably-sized module can use it to access all of its static data, be it a read-only table embedded in the code segment or read-write data in the data segment.

The disassembler automatically performs the necessary calculations to convert the rip-relative address to an absolute one at disassembly time, so you are unlikely even to realize that anything has changed.

Immediates

In general, immediates are capped at 32 bits. The exception is that you can use a 64-bit immediate in the mov reg, imm64 instruction.

Segments

Segmentation is architecturally dead. The processor is always in flat mode. The fs and gs selectors have been repurposed as two additional registers that add an operating-system-defined value to the effective address.

    mov   rax, qword ptr gs:[rcx*8+1480h]

The base address assigned to the gs register is added to the effective address rcx * 8 + 0x1480, producing a final address that is the target of the memory operation.

Windows sets the gs register’s base address to a block of per-thread data. During context switches, the base address of the gs register is updated to point to the per-thread data of the incoming thread. The fs register has not yet been assigned a meaning and should not be used.⁴ The Windows ABI forbids modifying either of these segment registers.

Instruction set changes

Some rarely-used instructions have been removed, primarily the binary-coded decimal instructions, BOUND, and PUSHAD/POPAD instructions.

New instructions for dealing with 64-bit registers:

    ; sign-extend 32-bit to 64-bit
    movsxd  r64, r32/m32

There is no need for a zero-extend instruction because operations on 32-bit registers automatically zero-extend to 64-bit values, so if the value was the result of a calculation, you probably got the zero-extended value anyeway. If you want to wipe out the top 32 bits of an existing 64-bit value, you could do

    ; zero-extend 32-bit to 64-bit
    mov     r32, r32

This can result in some odd-looking instructions like

    mov     eax, eax        ; zero-extend eax to rax

On its face, the instruction looks pointless, but we’re performing for the zero-extending side effect.⁵

There are also specialized instruction for certain sign-extending scenarios:

    cwqe                    ; sign-extend eax to rax
    cqd                     ; sign-extend rax to rdx:rax

Lightweight leaf functions and exception handling

A lightweight leaf function is one which can perform all of its work using only non-preserved registers, the inbound parameter home space, and stack space occupied by stack-based inbound parameters (if any). Preserved registers and the stack pointer must remain unchanged for the entire lifetime of the function, and the return address must remain at the top of the stack.

The inability to move the stack pointer means that the stack pointer is not at a multiple of 16 for the lifetime of a lightweight leaf function.

The x86-64 ABI abandons the stack-based exception handling model of its 32-bit older brother and joins the RISC crowd by using table-based exception handling. With the exception of lightweight leaf functions, all functions must declare unwind codes that allow the exception unwinder to restore registers from the stack and find the return address. Any function that does not have unwind codes is assumed to be a lightweight leaf function.

Annotated disassembly

I’ll defer to the existing documentation (which I wrote).

Encoding notes

Instructions that operate on the classic 32-bit or 8-bit registers tend to have the most compact encodings. Using any of the new registers (r8 through r15, or xmm8 through xmm15, or the new aliases sil, dil, spl or bpl) typically requires a one-byte prefix. An instruction that operates on word-sized data typically incurs an additional byte encoding. And fancy addressing modes (involving scaling or multiple registers contributing to the effective address) also require yet another byte for the encoding.

I’m not sure how aggressively the compiler allocates registers and chooses instructions which have compact encodings. It certainly didn’t stand out to me.

Bonus reading: x64 software conventions.

Bonus chatter: Now that I’ve exhausted my list of processors that Windows has supported over the years, I’ll have to start branching out into other processors. I’m open to suggestions. Though I probably won’t be as detailed as these processor overviews have been, since the original goal of these overviews was to give you enough information to get started debugging on Windows. For other processors, I’ll probably just focus on the one or two things that make them interesting, like SPARC register windows, or 68000’s separate data and address registers.

¹ Early versions of Windows CE allegedly supported the StrongARM and possibly even M32R and other architectures, but I can’t find any binaries for those versions, so I have nothing to investigate.

² They are still physically present and usable, but in practice, nobody uses them, and they are not part of the calling convention. The legacy floating point registers are overlaid on top of the SIMD registers, and switching between legacy mode and SIMD mode requires the use of the very slow EMMS instruction.

³ I strongly suspect this design decision was made to avoid introduce spurious register dependencies due to partial register operations.

⁴ On x86-32, the fs register is used to access the per-thread data. Why did Windows switch to using gs on x86-64? One theory is that there is a special instruction on x86-64 called SWAPGS that lets the kernel exchange the gs base address with another internal register. This instruction is used on transitions to and from user mode, so the kernel can quickly switch from user-mode thread data to kernel-mode thread data on entry and to switch it back on exit. No such courtesy instruction exists for the fs register. Another theory is that fs is reserved for the 32-bit emulation layer.

⁵ It also means that the x86-32 pun of interpreting nop as xchg eax, eax does not work in x86-64. The self-exchange zeroes out the high 32 bits as a side effect. The Windows debugger doesn’t realize this, and if you ask it to assemble xchg eax, eax, it encodes it as 90, using the one-byte encoding of xchg eax, r32, unaware that this doesn’t work if the other register is also eax. The correct encoding of xchg eax, eax is 87 c0, using the larger two-byte encoding.

80486 - Intel - WikiChip https://en.wikichip.org/wiki/intel/80486

The 80486, also i486 and 486, (pronounced eighty-four-eighty-six) was a family of 32-bit 4th-generation x86 microprocessors introduced by Intel in 1989 as a successor to the 80386. 486 introduced a number of enhancements to 386 including a new level 1 cache, better IPC performance, and an integrated FPU. The 486 became the first x86 chip family to exceed one million transistors.

The Intel 80386, part 1: Introduction - The Old New Thing https://devblogs.microsoft.com/oldnewthing/20190120-00/?p=100745

The Intel 80386, part 1: Introduction

Raymond Chen

January 20th, 20190 0

Windows NT stopped supporting the Intel 80386 processor with Windows 4.0, which raised the minimum requirements to an Intel 80486. Therefore, the Intel 80386 technically falls into the category of “processor that Windows once supported but no longer does.” This series focuses on the portion of the x86 instruction set available on an 80386, although I will make notes about future extensions in a special chapter.

The Intel 80386 is the next step in the evolution of the processor series that started with the Intel 8086 (which was itself inspired by the Intel 8080, which was in turn inspired by the Intel 8008). Even at this early stage, it had a long history, which helps to explain many of its strange corners.

As with all the processor retrospective series, I’m going to focus on how Windows NT used the Intel 80386 in user mode because the original audience for all of these discussions was user-mode developers trying to get up to speed debugging their programs. Normally, this means that I omit instructions that you are unlikely to see in compiler-generated code. However, I’ll set aside a day to cover some of the legacy instructions that are functional but not used in practice.

The Intel 80386 has eight integer registers, each 32 bits wide.

Register	Meaning	Preserved?
`eax`	accumulator	No
`ebx`	base register	Yes
`ecx`	count register	No
`edx`	data register	No
`esi`	source index	Yes
`edi`	destination index	Yes
`ebp`	base pointer	Yes
`esp`	stack pointer	Sort of

The register names are rather unusual due to the history of the processor line. That history also explains why the instruction encoding uses the non-alphabetical-order eax, ecx, edx, ebx.

Also for historical reasons, there are also names for selected partial registers.

Register	Meaning
`ax`	Lower 16 bits of `eax`
`bx`	Lower 16 bits of `ebx`
`cx`	Lower 16 bits of `ecx`
`dx`	Lower 16 bits of `edx`
`si`	Lower 16 bits of `esi`
`di`	Lower 16 bits of `edi`
`bp`	Lower 16 bits of `ebp`
`sp`	Lower 16 bits of `esp`
`ah`	Upper 8 bits of `ax`
`al`	Lower 8 bits of `ax`
`bh`	Upper 8 bits of `bx`
`bl`	Lower 8 bits of `bx`
`ch`	Upper 8 bits of `cx`
`cl`	Lower 8 bits of `cx`
`dh`	Upper 8 bits of `dx`
`dl`	Lower 8 bits of `dx`

Operations on these register fragments affect only the indicated bits; the other bits of the 32-bit register remain unaffected. For example, storing a value into the ax register leaves the most-significant 16 bits of the eax register unchanged.¹

Windows NT requires that the stack be kept on an 4-byte boundary. There is no red zone.

The 80386 also has eight 80-bit extended precision floating point registers named st0 through st7. The floating point system is rather unusual: In addition to the fact that the registers are extended precision, the programming model for the floating point registers is as a stack. Values are pushed onto the floating point stack, operations are performed on the stack, and results are popped off.

Floating point support is optional and is provided by the 80387 coprocessor chip, which runs concurrently with the main CPU. If a floating point instruction is executed on a system that lacks a floating point coprocessor, the floating point instruction traps, and the kernel emulates the instruction.

There are also some non-integer registers which are difficult/impossible to get to, but which still participate in user-mode instructions.

Register	Meaning	Notes
`eip`	instruction pointer	program counter
`eflags`	flags
`cs`	code segment	Don’t worry about it
`ds`	data segment	Don’t worry about it
`es`	extra segment	Don’t worry about it
`fs`	F segment	For TEB access
`gs`	G segment	Not used

Windows NT uses the 80386 in flat mode, which means that applications see a contiguous 32-bit address space. The segment registers largely don’t come into play when in flat mode, with the exception of the fs register, which we’ll learn about more when we get to the TEB.

The flags register is updated by many instructions. We’ll learn more about flags when we study conditionals.

The 80386 is unusual in that it supports multiple calling conventions. Common to all the calling conventions are the register preservation rules and the return value rules: The function return value is placed in eax. If the return value is a 64-bit value, then the most significant 32 bits are returned in edx. If the return value is a floating point value, it is returned in st0, and possibly st1 (for complex numbers).

Furthermore, link-time code generation is permitted to manufacture ad hoc calling conventions which may not even follow the register preservation rules. It’s crazy free-for-all time.

The architectural names for data sizes are as follows:

byte: 8-bit value
word: 16-bit value
dword (doubleword): 32-bit value
qword (quadword): 64-bit value
tword (ten-byte word): 80-bit value

Instruction encoding is highly irregular. Instructions are variable-length, and instructions can begin at any byte boundary.

The general pattern for multi-operand opcodes is

    opcode  destination, source

Note that the destination is on the left. Note also that three-operand instructions are rare. This will become interesting when we get to arithmetic.

Here’s the notation I will use when introducing instructions:

Notation	Meaning
r`n`	`n`-bit register
m`n`	`n`-bit memory
i`n`	`n`-bit immediate
r/m`n`	`n`-bit register or `n`-bit memory
r/m/i`n`	`n`-bit register, `n`-bit memory, `n`-bit immediate, or 8-bit immediate sign-extended to `n` bits

If n is omitted, then 8, 16, and 32 are permitted. For example, “r/m” means “r/m8, r/m16, or r/m32”.
Immediates are sign-extended as necessary.
The first operand is called “d” (destination).
The second operand (if any) is called “s” (source).
The third operand (if any) is called “t” (second source).
At most one of the operands can be a memory operand.
All operands must have the same size.

Exceptions to the above rules will be called out as necessary.

For example:

    ADD     r/m, r/m/i          ; d += s,      set flags

The ADD instruction takes two operands. The first is a register or memory, and the second is a register or memory or immediate or single-byte immediate. They cannot both be memory operands. They must be the same size.

Many instructions have a more compact encoding if the destination register is al, ax, or eax.

The assembly language overloads multiple variations of instructions into a single opcode. This is different from most other processors, where each opcode maps to an instruction template, where all that’s left to fill in are the registers and immediates. For example, the MIPS R4000 has two different shift opcodes depending on whether the shift amount is specified by an immediate or a register. But the 80386 assembly language uses the same opcode for both, and it’s the assembler’s job to figure out which variant you intended.

The 80386 does not not perform speculation, does not have an on-chip cache, does not have a branch predictor, and does not reorder memory accesses. Life was simpler then.

Okay, that’s enough background. We’ll dig in next time by looking at memory addressing modes.

¹ This partial register behavior wasn’t a big deal at the time, but it ended up creating register dependencies that made it much harder to add out-of-order execution to later versions of the processor. It even created a register version of the store-to-load forwarding problem.

The x86-64 architecture took a different approach when it extended the 32-bit registers to 64-bit registers: If the destination register is encoded as a 32-bit subset of a 64-bit register, the upper 32 bits of the destination register are zeroed.

Molecular Expressions: Science, Optics & You - Olympus MIC-D: Integrated Circuit Gallery - Intel 386 Microprocessor https://micro.magnet.fsu.edu/optics/olympusmicd/galleries/chips/intel386b.html

Integrated Circuit Image Gallery

Intel 386 Microprocessor

In 1985, with a 16 billion-dollar-software library focused on the 8088 and 80286, the compatible 80386 ushered in the third generation of Intel microprocessors. After the correction of many of the 286's inherent problems, the 386 represented a giant leap in raw power with true 32-bit architecture (external data buses and internal registers).

The new 386 design, available at 12.5 MHz and 16 MHz clock speeds, allowed the chip to process information more than twice as fast as the 286, and spawned a new revolution in software design. Intel increased the clock speed to 33 MHz by 1989 and ultimately, clone manufacturer Advanced Micro Devices (AMD) raised it to 40 MHz. The 386 central processing unit (CPU) was originally fabricated at the same 1.5-micron linewidth process level as the 286, but the number of transistors was more than doubled to 275,000. At a transistor number a hundred times as great as the original 4004 chip, the semiconductor manufacturers applied the new high-speed, and low-power consumption high-density complementary metal oxide semiconductor (CMOS) 1-micron technology. The best-selling 386 was the first processor for the personal computer that was able to switch between real and enhanced modes without rebooting.

The i80386 featured several processor modes to preserve backward compatibility with the 80286, and an additional memory management unit (MMU) that allowed paged memory, security rings of privilege, and new operation codes similar to those available with the Zilog Z-80 and Z-280 processors. Capable of true multi-tasking, the innovative 386 could run multiple programs simultaneously by operating in "virtual 8086 mode". In this mode, the CPU is divided into partitions, each of which functions as a separate 8086-based computer. The 80386DX is the new name given to the original 386 CPU after the release of the 80386SX, a scaled down version of the best-selling silicon chip.

Introduction of Intel 386 - Event - Computing History http://www.computinghistory.org.uk/det/6192/Introduction-of-Intel-386/

October 1985

Introduction of Intel 386

Introduction of Intel 386 (later qualified DX) processor with 275,000 transistors

The Intel 80386, also known as the i386, or just 386, was a 32-bit microprocessor introduced by Intel in 1985. The first versions had 275,000 transistors and were used as the central processing unit (CPU) of many personal computers and workstations. As the original implementation of the 32-bit extensions to the 8086 architecture, the 80386 instruction set, programming model, and binary encodings are still the common denominator for all 32-bit x86 processors. This is termed x86, IA-32, or the i386-architecture, depending on context.

The 80386 could correctly execute most code intended for earlier 16-bit x86 processors such as the 80286; following the same tradition, modern 64-bit x86 processors are able to run most programs written for older chips, all the way back to the original 16-bit 8086 of 1978. Over the years, successively newer implementations of the same architecture have become several hundreds of times faster than the original 80386 (and thousands of times faster than the 8086). A 33 MHz 80386 was reportedly measured to operate at about 11.4 MIPS.

The 80386 was launched in October 1985, and full-function chips were first delivered in 1986. Mainboards for 80386-based computer systems were at first expensive to buy, but prices were rationalized upon the 80386's mainstream adoption. The first personal computer to make use of the 80386 was designed and manufactured by Compaq.

In May 2006, Intel announced that production of the 80386 would cease at the end of September 2007. Although it has long been obsolete as a personal computer CPU, Intel and others had continued to manufacture the chip for embedded systems. Embedded systems that utilise a 80386 or one of its derivatives are widely used in aerospace technology.

posted @ 2016-12-07 12:27 papering 阅读(478) 评论(0) 编辑收藏举报

刷新页面返回顶部