Computer Organization and Design--计组作业习题(4)

Computer Organization and Design

 

  ----------------------个人作业,如果有后辈的作业习题一致,可以参考学习,一起交流,请勿直接copy

Problem 1. Multi-cycle datapath (10 points)

 

The multi-cycle datapath below has two bugs (marked with purple circles).

   Bug 1: One of the inputs to the MUX before the ALU is 0 instead of 1.

Bug 2: The output from the register file to the memory is broken and is grounded (it is always 0)

 

 

a) Consider the following small program. It’s run for 100 cycles.

i) How many times is each line executed? 

ii) Which lines of code are affected by bug 2.

(Fill in the table for your answer)

 

0                  lw 0 1 num      

1                  lw 0 2 one   

2                  lw 0 3 neg1   

3  loop:        add 2 2 2  

4                  add 1 3 1  

5                  beq 1 0 loop   

6                  sw 0 2 num      

7                  halt      

8  num:    .fill 6    

9  one:      .fill 1    

10  neg1: .fill -1     

 

Instr

0

1

2

3

4

5

6

7

8

9

10

Execution count

20

-------

--------

-------

-------

-------

-------

-------

-------

-------

-------

Bug 2

-------

-------

-------

-------

-------

-------

 √

-------

-------

-------

-------

 

b) Now assume that bug 1 is fixed. Fill in the final values of the registers and the memory locations after the completion of the program. If the value remains unchanged, fill the column with a ‘-‘.

 

Reg0

Reg1

Reg2

Reg3

Reg4

Reg5

Reg6

Reg7

---------

    5

    2

   -1

---------

---------

---------

---------

 

0x0

0x1

0x2

0x3

0x4

0x5

0x6

0x7

0x8

0x9

0xA

---------

---------

---------

---------

---------

---------

  2

---------

---------

---------

---------

 

 

Problem2: Pipelining (10 points) 

a) Consider the following assembly program

 

0          lw 3 1 100

1          add 3 7 3

2          lw 3 2 100

3          add 1 2 1

4          add 1 2 2

5          add 1 2 1

 

What is the total number of cycles taken to complete the above program in the multi-cycle datapath discussed in class, assuming instruction 0 does not stall, 

 i) without pipelining

 

 Total Cycles :5+4+5+4+4+4=26 cycles

 

 ii) with pipelining, with detect-and-stall 

 

 Total Cycles :10+2+2+2+2=18 cycles(基础为6+5-1=10,有1-2,2-3,3-4,4-5 共4个冲突,每个stall 2个Cycles)

 

 iii) with pipelining, with detect-and-forward 

 

 Total Cycles :10+1=11 cycles(基础为6+5-1=10,有2-3一个lw与add的冲突,增加1个Cycles)

b) Consider the multi-cycle datapath discussed in class (for LC2K ISA) with detect-and-forward logic for data hazards. Suppose 100 instructions are executed on it, with no branches. (Circle the right answer).

 

i) What mix of instructions would take the longest?

 

a. Back-to-back dependent loads

b. Alternating loads and stores (each store is dependent on the immediately preceding load).

c. Alternating loads and adds (each add is dependent on the immediately preceding load).

 

 

ii) Similarly, what mix of instructions would take the least amount of time? 

 

a. Back-to-back dependent adds

b. Back-to-back independent nands

c. All of the above

d. None of the above

 

 

Problem 3: Pipelining (10 points)

Consider the following information of an LC2K architecture. The latencies for the five different pipeline stages are:

 

 Instruction Fetch (IF): 2.5 ns

 Instruction Decode (ID): 1 ns

 Execute (EX): 0.75 ns

 Memory access (MEM): 3.25 ns

 Write Back (WB): 1 ns

 

For the class of applications executed on this architecture, the mix of instructions is:

 - There are no branches

 - 30% of instructions are loads

 - 25% of instructions immediately following the load are dependent on it. 

 - 15% of the time, the second instruction after a load depends on it.

 

 

 

 

a) What is the clock period for this design?

 

 3.25ns

b) How many cycles does an instruction take to complete on average in this design?

 

 设有100个instructions, 基础为100+5-1=104个cycles;

detect-and-forward:有30%*100*25%=7.5个用于调整的空余cycles,则

                  CPI=(104+7.5)/100=1.115;

 

Suppose now, the MEM stage is divided in to two stages MEM0 and MEM1 with latencies 2.5 ns and 1.25 ns respectively. Now, memory accesses take 3.75 ns, but the instruction after the memory access can proceed to the MEM0 stage while an in-progress memory access advances to MEM1 (Ignore the case of two back-to-back memory access instructions).

 

c) What is the new clock period?

 

 

 New clock period :2.5ns;

 

 

d) What is the new CPI?

 

 同b ),但基础为100+6-1=105,故:

New CPI=(105+7.5)/100=1.125;

 

 

e) Considering instruction throughput, was changing the pipeline depth from 5 to 6 better based on your answers above? Justify your answer.

 

 Yes;

6 cycles:1.125*2.5ns=2.8125ns;

5 cycles:1.115*3.25ns=3.62375ns;    当pipeline depth 为6 时,用时更少,效率更高;

 

 

f) To get lower clock periods, one way is to increase the number of pipeline stages. However, there is added overhead due to the latches needed between pipeline stages. Suppose now, that the clock period is reduced to 0.62 ns, but the number of pipeline stages for each stage are:

 IF: 20

 ID: 8

 EX: 6

 Mem: 36

 WB: 8

Considering instruction throughput, is reducing the clock period by increasing the number of pipeline stages in this example, better? Justify your answer.

 

Yes;

pipeline stages:20+8+6+36+8=78;

设有100个instructions,基础值为:100+78-1=177;

若要使发生冲突的语句能够成功运行,需要使下一句的EX与上一句的WB对齐,即需要空

(8-1)+6+36 - 8 =41个cycles;(8-1为下一个IF运行后的上一句的ID cycle数,6为EX,36为Mem,8为下一句耗费的ID数)共有7.5个冲突,共需7.5*41=307.5个空cycles;

则 CPI=(177+307.5)/100=4.845

78 cycles : 4.485*0.62ns=2.7807ns; 此时,与e) 相比较,用时更少,效率更高;

则throughput =1/2.7807ns `= 0.3596/ns;

posted @ 2017-04-03 16:43  nanashi  阅读(428)  评论(0编辑  收藏  举报