Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor(1)

Time

2020.10.30

Summary

We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads.

Research Objective

Problem Statement

Method(s)

Evaluation

Conclusion

Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources.

Notes

Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle

lltllsen, et al., [27] showed the potential of an SMT processor to achieve significantly higher throughput than either a wide superscalar or a multithreaded processor, That paper also demonstrated the advantages of simultaneous multithreading over multiple processors on a single chip, due to SMT’S ability to dynamically assign execution resources where needed each cycle.

The changes necessary to support simultaneous multithreading on that architecture are:

multiple program counters and some mechanism by which the fetch unit selects one each cycle,

a separate return stack for each thread for predicting subroutine return destinations,

per-thread instruction retirement, instruction queue flush, and trap mechanisms,

a thread id with each branch target buffer entry to avoid predicting phantom branches, and a larger register file, to support logical registers for all threads plus additional registers for register renaming. The size of the register file affects the pipeline (we add two extra stages) and the scheduling of load-dependent instructions, which we discuss later in this section.

Words

Simultaneous multithreading (SMT) 同步多线程
impediments 障碍
detailed 详细的
boost 促进
heuristics 启发式
undue 过度的,不当的
instruction queue (IQ) 指令队列
program counter (PC) 编程计数器
round-robin 轮询
scheme 方案
register renaming 间址???
ramifications 后果
index register 变址寄存器
consecutive 连续的
inter-instruction 指令间的
predetermined 预定的
squash 挤压
extending 延伸

Sentence

This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures
通过之前没有在其他体系结构用过的多线程优势来提高速度
issue multiple instructions each cycle
每个周期发出多个指令
the latency-hiding ability of multithreaded architectures
多线程架构的延迟隐藏技术
heavily leveraged off existing superscalar technology.
大量利用现有的超标量技术
That is the case for all instructions but loads
这是除了加载外其他所有指令的情况

TimtLine

This architecture allows us to address several concerns about

posted @ 2020-10-30 21:30  大大大大大圣归来  阅读(207)  评论(0编辑  收藏  举报