Chapter 4 Processor Architecture 2017-11-7


The instructions supported by a particular processor
and their byte-level encodings are known as its instruc-
tion set architecture(IsA).

4.1 The Y86-64 Instruction Set Architecture(ISP)

Including defines the different components of its
state, the set of instructions and their encodinga, a set
of programming conventions and the handling of exception-
events.

4.11 Programmer-visible state

The state for Y86-64 is similar to that for x86-64.
There are 15 program register(no %r15).Each of these re-
gister stores a b4-bit word. %rsp is used as a stack poi-
nter by the push, pop call, and return instructions.

There are three single-bit condition codes, ZF, SF,
and 0F, storing information about the effect of the most
recent arithmetic or logical instruction. The program
counter(PC) holds the address of the instruction
currently being executed.

The memory is conceptually a large array of bytes,
holding both program and data. using virtual addresses.

A final part of the program state is a status code
Stat, indicating the overall state of program execution.
It will indicate either normal operation or that some
sort of exception has occured.

4.1.2 Y86-64 instrunctions

details about the Y86-64 instructions:

+ The 86-64 movq instruction is split into four:
irmovq,rrmovq,mrmovq,and rmmovq, explicityly indicate
the form of the source and destination. The source is
either immediate(i), register(r),or memory(m).

+ There are four integer operation instructions: addq,
subq, andq, and xorq. they operate only on register
data. These instructions set the three condition codes
ZF, SF, and Of(zero,sign, and overflow).
+ the seven jump instructon are jmp, jle,jl,je,jne,jge,
and jg. Branches are taken according to the type of
branch and the setting of the condition codes.
+ There are six conditional move instructions: cmovle,
cmovl, cmove, cmovne, cmovge, cmovg.
+ The call instruction pushes the return address on the
stack and jumps to the destination address. The ret
instrunction returns from such a call.
+ The pushq and popq instructions implement push and pop
+ The halt instrunction stops instrucion execution. X86-
64 has a comparable instruction, called hlt. x86-64
application programs are not permitted to use this
instruction, since it causes the entire system to
suspend operation. For Y86-64, executing the halt
instruction causes the processor to stop, with the
status code set to HLT.

4.1.3 Instruction Encoding

Each instruction requires between 1 and 10 bytes,
depending on which fields are required.

command encoding list
--------------------------------
halt 00 | nop 10
--------------------------------
rrmovq 20 | irmovq 30
--------------------------------
rmmovq 40 | mrmovq 50
--------------------------------
OPq 60 | jXX 70
--------------------------------
cmovXX 2X | call 80
--------------------------------
ret 90 | pushq A0
--------------------------------
popq B0 |
--------------------------------

Every instruction has an initial byte identifying the
instruction type. This byte is split into two 4-bit parts
: the high-order,or code, part, and the low-order, or
function, part. Code values range from 0 to 0xB. The
function values are significant only for the cases where
a group of related instructions share a common code.

Each of the 15 program registers has an associated
register identifier(ID) ranging from 0 to 0xE. ID value
0xF is used in the instruction encodings and within our
hardware designes when we need to indicate that no regis-
ter should be accessed.

The register fields are called rA and rB (source and
destination,respectively). Those that require just one
register operand have the other register specifier set to
value '0xF'.

Some instructions require an additional 8-byte cons-
tant word. This word can serve as the 'immediate data',
'displacement' for address specifiers, and the destina-
tion of branches and calls. Note that 'branch and call'
'destinations are given as absolute addresses'. As with
x86-64, all integers have a 'little-ending' encoding.
when the instrunction is written in disassembled form,
these bytes appear in reverse order.

4.1.4 Y86-64 Exceptions

Satus code 'stat' describing the overall state of the
execuing program.

-------------------------------------------------
Value Name Meaning
-------------------------------------------------
1 AOK Normal operation
2 HLT halt instruction encountered
3 ADR Invalid address encountered
4 INS Invalid instruction encountered
-------------------------------------------------

For Y86-64, we will simply have the processor stop
executing instructions when it encounters any of the
exceptions listed. In a more complete design, the proce-
ssor would typically invoke an exception handler.

4.1.6 Some Y86-64 Instruction Details

The pushq instruction both decrements the stack poin-
ter by 8 and write a register value to memory.

4.2 Logic Design and the Hardware Control Language HCL

Three major components are required to implement a
digital system: 'combination logic' to compute functions
on the bits, 'memory elements' to store bits, and
'clock signals'' to regulate the updating of the memory
elements.

4.2.1 Logic Gates

Logic gates are the basic computing elements for dig-
ital circuits. They generate an output equal to some Boo-
lean functions 'AND, OR, and NOT'.

HCL expressions are : && for AND, || for OR, and !
for NOT. We use these instead of the bit-level C operator
&, |, and ~, because logic gates operate on single-bit
quantities, not entires words.

4.2.2 Combinational Circuits and HCL Boolean Expressions

By assembling a number of logic gates into a network,
we can construct computational blocks knows as combina-
tional circuits. Several restrictions are placed on how
the networks are constructed:

+ Every logic gate input must be connected to excatly
one of the following:
1. one of the system inputs(known as primary input)
2. the output connection of some memory element, or
3. the output of some logic gate.

+ The outputs of two or more logic gates cannot be
connected together.
+ The network must be acyclic. That is, there cannot
be a path through a series of gates that forms a
loop in the network.

A multiplexor selects a value from among a set of
different data signals, depending on the value of a con-
trol input signal.

bool out = (S && a) || (!S && b);

a ^ b = (!a && b) || (a && !b);

4.2.3 Word-level Combinational Circuits and HCL Integer
Expressions

In HCL, we will declare any word-level signal as an
int, without specifying the word size. This is done for
simplicity.

Multiplexing functions are described in HCL using
'case expressions'. A case expression is like:

[
select1 : expr1;
select2 : expr2;
.
.
selectK : exprK;
]

The expression contains a series of cases, where each
case i consists of a Boolean expression 'selecti', indi-
cating when this case should be selected, and an integer
expression expri, indicating the resulting value. Logical
-ly, the selectioon expressiongs are evaluated in sequen-
ce, and the case for the first one yielding 1 is selected
.

word OUt = [
s: A;
1: B;
];

In this code, the second selection expression is sim-
ply 1, indicating that this case should be selected if no
prior one has been. This is the way to specify a default
case in HCL. Nearly all case end in this manner.

The selection expressions can be arbitrary Boolean
expressions, and there can be arbitrary number of case.
This allows case expressions to describe blocks where
there are many choices of input signals with complex
selection criteria.

word Out4 = [
!s1 && !s0: A; # 00
!s1 : B; # 01
!s0 : C; # 10
1 : D; # 11
];

One important combinational circuit, known as an'ALU'
(arithmetic/logic unit)

4.2.4 Set Membership

bool s1 = code == 2 || code == 3;
bool s0 = code == 1 || code == 3;

can rewrite to a more concise expression:

bool s1 = code in { 2, 3 };
bool s0 = code in { 1, 3 };

s1 is 1 when code is in the set {2,3}, and s0 is 1 when
code is in the set {1,3}.

So, the general form of a set membership test is:

iexpr in {iexpr1,iexpr2,...,iexprk}

where the value being tested (iexpr) and the candidate
matches(iexpr1 through iexprk) are all integer expres-
sion.

4.2.5 Memory and Clocking

Our storage devices are all controlled by a single
clock, a periodic singal that determines when new values
are to be loaded into the devices.

We consider two classes of memory devices:

+ Clocked register (or simply register) store
individual bits or words.
+ Random access memories (or simply memories) store
multiple words, using an address to select which
word should be read or written, such as:

1. the virtual memory system of a processor,
where a combination of hardware and operating
system software make it appear to a processor
that it can access any word with a large adddr
space.
2. the register file, where register identifiers
serve as the addresses. In a Y86-64 processor,
the register file holds the 15 program regiter
(%rax through %r14).

4.3 Sequential Y86-64 Implementations

As a first step, we describe a processpr called SEQ (
for "sequential" processor). On each clock cycle, SEQ
performs all the steps required to process a complete
instruction.

4.3.1 Organizing Processing into stages

In general, processing an instruction involves a num-
ber of operations. We organize them in a particular sequ-
ence of stages, attempting to make all instructions fol-
low a uniform sequence, even though the instructions
differ greatly in their actions.

The following is an informal description of the stage
and the operations performed within them:

+ Fetch.
The fetch stage reads the bytes of an instruction
from memory, using the program counter(PC) as the
memory address. and computes valP to be the address
of the instruction following the current one in se-
quential order. That is valP equals to the value of
the PC plus the length of the fetched instruction.
+ Decode.
Typically, it reads the registers designated by in-
struction fields rA and rB, but for some instructin
it reads register %rsp.
+ Execute
The arithmetic/logic unit(ALU) either performs the
operation specified by the instruction(ifun), com-
putes the effective address of a memory reference,
or increments or decrements the stack pointer. The
stage will possibly set the condition codes.
+ Memory
this stage may write data to memory, or read data
from memory.
+ write back
writes up to two results to the register file.
+ pc update
The PC is set to the address of the next instructin

**读命令 寄存器读取操作数 执行 内存读写 结果写入寄存
**器 PC存入下一条指令地址**************************

'pushq should decrement the stack pointer before writing'
'popq should first read memory,then increment stack ptr'

4.3.2 SEQ Hardware Structure

In the SEQ, all of the processing by the hardware
units occurs within a single clock cycle.

The hardware units are associated with the different
processing stages:

+ Fetch
+ Decode
+ Execute
+ Memory
+ Write back
The register file has two write ports. Port E is
used to write values computed by the ALU, port M is
used to write values read from the data memory.
+ PC update
The new value of the PC is a address of the three:
1. next instruction. valP
2. call or jump valC
3. return read from memory valM
The program counter PC is the only clocked register
in SEQ.

4.3.3 SEQ Timing

Our implementation of SEQ consists of combinational
logic and two forms of memory devices: clocked registers
(the program counter and condition code register) and
random access memories (register file, the instruction
memory, and the data memory).
Combinational logic does not require any sequencing
or control--values propagate through a network of logic
gates whenever the inputs change. Also we assume that
reading from a random access memory operates much like
combinational logic, with the output word generated based
on the address input.

4.3.4 SEQ stage implementations

-----------------------------------------------
Name Value(hex) Meaning
-----------------------------------------------
IHALT 0 code for halt instruction
INOP 1 code for nop instruction
IRRMOVQ 2 for rrmovq
IIRMOVQ 3
IRMMOVQ 4
IMRMOVQ 5
IOPL 6
IJXX 7
ICALL 8
IRET 9
IPUSHQ A
IPOPQ B

FNONE 0 Default function code

RESP 4 Register ID for %rsp
RNONE F Indicates no register file
access
ALUADD 0 Function for addition operation

SAOK 1 status code for nomal operation
SADR 2 status code for address exception
SINS 3 status code for illegal instruct-
ion exception
SHLT 4 Status code for halt
-----------------------------------------------

+ Fetch stage

the fetch stage includes the instruction memory
hardware unit. This unit reads 10 bytes from memory
at a time, using the PC as the address of the first
byte(byte 0).

Based on the value of icode, we can compute three
1-bit signals: instr_valid; need_regids; need_valC.

bool need_regids =

icode in { IRRMOVQ, IOPQ, IPUSHQ, IPOPQ,
IIRMOVQ,IRMMOVQ,IMRMOVQ };

The remaining 9 bytes read from the instruction
memory encode some combination of the register speci-
fier byte and the constant word.

For PC value p, need_regids value r, and need_valC
value i, the incrementer generates the value:

p + 1 + r + 8i //valP value.

+ Decode and Write-Back Stages

The register file has four ports: read(A,B), write
(E,M). Each port has both an address connection and
a data connection, where the address connection is a
register ID, and the data connection is a set of 64
wires serving as either an output word(for a read
port) or an input word(for a write port) of the regi-
ter file.

word srcA = [
icode in { IRRMOVQ, IRMMOVQ, IOPQ, IPUSHQ } : rA;
icode in { IPOPQ, IRET } : RRSP;
1 : RNONE; # Don't need register
];

word srcB = [
icode in { IOPQ, IRMMOVQ, IMRMOVQ } : rB;
icode in { IPOPQ, ICALL, IPUSHQ, IRET } : RRSP;
1 : RNONE;
];

Register ID dstE indicates the destination register
for write port E, where the computed value valE is
stored.

# Warning: conditional move not implement correctly.
word dstE = [
icode in { IRRMOVQ } : rB;
icode in { IIRMOVQ, IOPQ } : rB;
icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
1 : RNONE;
];

Register ID dstM indicates the destination register
for write port M, where valM, the value read from
memory, is stored.

word dstM = [
icode in { IMRMOVQ, IPOPQ } : rA;
1 : RNONE;
];

+ Execute Stage

The execute stage includes the arithmetic/logic unit
(ALU). This unit oerforms the operation ADD, SUBTRACT
, AND, or EXCLUSIVE-OR on inputs aluA and aluB based
on the setting of the alufun signal.
# operands are listed with aluB first,followed by
# aluA to make sure the subq subtracts valA from valB

word aluA = [
icode in { IRRMOVQ, IOPQ } : valA;
icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ } : valC;
icode in { ICALL, IPUSHQ } : -8;
icode in { IRET, IPOPQ } : 8;
];

word aluB = [
icode in { IRMMOVQ, IMRMOVQ, IOPQ, ICALL,
IPUSHQ, IRET, IPOPQ } : valB;
icode in { IRRMOVQ,IIRMOVQ } : 0;
];

To generate a signal set_cc that controls whether or
not the condition code register should be updated.

bool set_cc = icode in { IOPQ };

+ Memory Stage

The memory stage has the task of either reading or
writing program data. Two control blocks generate the
value for the memory and the memory input data(for
write). Two other blocks generate the control signals
indicating whether to perform a read or a write
operation. When a read operation is performed, the
data memory generates the value valM.

the address for memory reads and writes is always
valE or valA.

vord mem_addr = [
icode in {IRMMOVQ, IPUSHQ, ICALL, IMRMOVQ } : valE;
icode in { IPOPQ,IRET} : valA;
];

the data for memory writes are always either valA or
valP.

word mem_data = [
# value from register
icode in { IRMMOVQ, IPUSHQ } : valA;
# return pc
icode == ICALL : valP;
];

We want to set the control signal mem_read only for
instructions that read data from memory.

bool mem_read = icode in { IMRMOVQ,IPOPQ,IRET};

bool mem_write = icode in { IRMMOVQ,IPUSHQ,ICALL};

A final function for the memory stage is to compute
the status code Stat resulting from the instruction
execution according to the values of icode,imem_error,
ann instr_valid generated in the fecth stage and the
signal dmem_error generated by the data memory.

word Stat = [
imem_error || dmem_error : SADR;
!instr_valid : SINS;
icode == IHALT : SHLT;
1 : SAOK;
];

+ PC Update Stage

The new PC will be valC, valM, or valP :

word new_pc = [
# call. Use instruction constant
icode == ICALL : valC;
# Taken branch. use instruction constant
icode == IJXX && Cnd : valC;
#Completion of RET instructon.Use value from stack;
icode == IRET : valM;
# Default: Use increamted pc
1 : valp;
];

Surveying SEQ

The only problem with SEQ is that it is too slow. The
clock must run slowly enough so that signals can propaga-
te through all of the stages within a single cycle.
Since each unit is only active for a fraction of the
total clock cycle. We will introducing pipelining.

4.4 General Principles of Pipelining

4.4.1 Computational Pipelines
+ Avoiding data hazards by stalling
We handle these by injecting a bubble into the exe-
cute stage each time we hold an instruction back in
the decode stage.

Hazard Possibilities in design for each state
- program registers
- program counter
- memory
- condition code register : No hazards can arise
- Status register

+ Avoiding data hazards by forwarding
This technique of passing a result value directly from
one pipeline stage to an earlier one is known as data
forwarding(or forwarding,or bypassing).

data forwarding can also be used when there is a pend-
ing write to a register in the memory stage.

A total of five different forwarding sources:
e_valE,m_valM,M_valE,W_valM,and W_valE
and two different forwarding destination:
valA and valB
+ Load/Use Data hazards

One class of data hazards cannot be handle purely by
forwarding, because memory reads occur late in the
pipeline. So we use a stall to handle a load/use ha-
zard which is called load interlock.

Load interlock combined with forwarding suffice to
handle all possible forms of data hazards.

+ Avoiding control hazards

control hazards arise when the processor cannot
reliably determine the address of the next instructin
based on the current instruction in the fetch stage.

control hazards can only occur in our pipelined pro-
cessor for ret and jump instructions. Moreover, the
latter case only causes difficulties when the direc-
tion of a conditional jump is mispredicted.

the control hazards can be handled by techniques such
as stalling and injecting bubbles into the pipeline
dynamically adjust the pipeline flow when special
conditions arise.

4.5.6 Exception handling

Exceptionas can be generated either internally,by the
executing program, or externally, by some signal outside.

our instruction set architecture includes three dif-
ferent internally generated exceptions, caused by:
1. halt 2.invalid instruction with function code.
3. attempt to invalid address for fetch or data read/w

+ first, it is possible to have exceptions triggered by
multiple instructions simultaneously.

We must determine which of these exceptions the proces-
sor should report to the operating system. The basic
rule is to 'put priority on the exception' triggered by
the instruction that is furthest along the pipeline.

+ second subtlety occurs when an instruction is first
fetched and begins execution, causes an exception, and
later is cancelled due to a mispredicted branch.
+ a third subtlety arises because a pipelined processor
updates different parts of the system state in differ-
ent stages.

We can both correctly choose among the different excep-
tions and avoid raising exceptions for instructions that
are fetched due to mispredicted branches by merging the
exception-handling logic into the pipeline structure.

'that is the motivation for us to include status code'
'stat in each of our pipeline registers'

No instruction following one that causes an exception
can alter the programmer-visible state.

Let us consider how this method of handling exceptions
deal with the subtleties we have mentioned:

- when an exception occurs in one or more stages of a
pipeline, the information is simply stored in the
status fields of the pipeline registers. The event
has no effect on the flow of instructions in the
pipeline until an excepting instruction reaches the
final pipeline stage, except to disable any updates
of the programmer-visible state( the condition code
register and the memory) by later instructions in
the pipeline. We then can guaranteed that the first
instruction encoutering an exception will arrive
first in the write-back stage, at which point pro-
gram execution can stop and the status code in
pipeline register W can be recorded as the program
status.

- If some instruction is fetched but later canceled,
any exception status information about the instru-
ction gets canceled as well. No instruction follow-
ing one that causes an execption can alter the pro-
grammer-visible state.

The simple rule of carrying the exception status toge-
ther with all other information about an instruction
through the pipeline provides simple and reliable
mechanism for handling exceptions.

4.5.7 PIPE Stage implementations

+ PC Selection and Fetch Stage

fetch stage logic must also select a current value for
the program counter and predict the next PC value.

word f_pc = [
# Mispredicted branch. Fetch at incremented PC
M_icode == IJXX && !M_cnd : M_valA;
# Completion of RET instruction
W_icode == IRET : W_valM;
# Default: use predicted value of PC
1 : F_predPC;
];

word f_predPC = [
f_icode in { IJXX, ICALL } : f_valC;
1 : f_valP;
];


word f_stat = [
imem_error : SADR;
!instr_valid : SINS;
f_icode == IHALT : SHLT;
1 : SAOK;
];

+ fetch and write back

fetch:

word d_dstE = [
D_icode in { IRRMOVQ,IIRMOVQ, IOPQ } : D_rB;
D_icode in { IPUSHQ,IPOPQ,ICALL,IRET } : RRSP;
1 : RNONE;
];

The merging of signals valA and valP exploits the fact
that only the call and jump instructions need the valP in
later stage.
when signal D_icode matches the instruction code for
call or jXX, the 'sel + fwd A' block should select D_
valp as its output.

There are five different forwarding sources:

Data_word Register_ID source_description
------------------------------------------------------
e_valE e_dstE ALU output
m_valM M_dstM memory output
M_valE M_dstE pending write to port E in memory
stage.
W_valM W_dstM pending write to port M in write-
back stage.
W_valE W_dstE pending write to prot E in write-
back stge.
------------------------------------------------------
if none of the forwarding conditions hold, the block
should select d_rvalA from port A, as input.

word d_valA = [
D_icode in { ICALL,IJXX } : D_valP; // use PC
d_srcA == e_dstE : e_valE; //forward valE
d_srcA == M_dstM : M_valM;
d_srcA == M_dstE : M_valE;
d_srcA == W_dstM : W_valM;
d_srcA == W_dstE : w_valE;
1 : d_rvalA; //use value from register file
];

# our pipelined implementation should always give
# priority to the forwarding source in the earliest
# pipeline stage, since it holds the latest instruction
# in the program sequence setting the register.

# this logic only needs to check the five forwarding:
word d_valB = [
d_srcB == e_dstE : e_valE; //forward valE
d_srcB == M_dstM : M_valM;
d_srcB == M_dstE : M_valE;
d_srcB == W_dstM : W_valM;
d_srcB == W_dstE : w_valE;
1 : d_rvalB; //use value from register file
];

+ write back stage:

word Stat = [
W_stat == SBUB : SAOK; //sbub mean buble
1 : W_stat;
];

+ Execute Stage

4.5.8 Pipeline Control Logic

This logic must handle the following four cases:

- Load/use hazards.
The pipeline must stall for one cycle between an
instruction that reads a value from memory and an
instruction that uses this value.

- Processing ret.

The pipeline must stall until the ret reaches the
write-back stage.

- Mispredicted branches.

several instructions must be canceled, and fetch
should begin at the instruction following the jump.

- Exception.

when causeing an exception, we want to disable
the updating of the programmer-visible state by
later instructions and halt execution once the
exceptig instruction reaches the write-bace stage.


For an instruction that causes an exception, we can:

1, disabling the setting of condition codes by in-
structions in the execute stage.
2, injectiong bubbles into the memory stage to disab-
le any writing to the data memory
3, stalling the write-back stage when it has an ex-
cepting instruction, thus bringing the pipeline to
a halt.

posted @ 2018-03-28 23:22  孤灯下的守护者  阅读(284)  评论(0编辑  收藏  举报