随笔-处理器微架构-rdtsc vs rdtscp

difference between rdtsc with rdtscp

RDTSC (Read Time-Stamp Counter) and RDTSCP (Read Time-Stamp Counter and Processor ID) are both x86 assembly instructions used for reading the CPU's time-stamp counter, which counts the number of cycles since the last reset. However, they differ significantly in terms of functionality and behavior.

Key Differences

1. Serialization

RDTSC: This instruction does not serialize the instruction stream. This means that it can be affected by out-of-order execution, where subsequent instructions may complete before it, leading to potential inaccuracies in timing measurements if not properly managed[1][2][5].
RDTSCP: In contrast, RDTSCP acts as a serializing instruction, ensuring that all previous instructions are completed before it is executed. This provides more reliable timing measurements because it prevents subsequent instructions from being executed until after the RDTSCP instruction has completed[1][2][4][7].

2. Output

RDTSC: Returns a 64-bit value representing the current value of the time-stamp counter in the EDX:EAX registers.
RDTSCP: Also returns a 64-bit value but additionally provides a processor ID in the ECX register, which can be useful for identifying which core executed the instruction[2][4].

3. Performance Impact

Latency: RDTSCP generally incurs higher latency compared to RDTSC due to its serialization behavior. For example, repeated calls to RDTSC may have a latency of around 28 cycles, while RDTSCP can have latencies ranging from 36 to 40 cycles[1][3].
Use Cases: RDTSC is often sufficient for general-purpose timing in performance-critical applications, while RDTSCP is preferred in scenarios where precise timing is crucial, such as benchmarking or profiling code execution[1].

4. Usage Considerations

RDTSC: Can be used freely without concern for serialization, but care must be taken to avoid issues with out-of-order execution.
RDTSCP: Should be used when accurate timing is necessary and when it is important to ensure that all prior operations are completed before capturing the time-stamp[2][5].

In summary, while both instructions serve similar purposes in accessing the CPU's time-stamp counter, RDTSCP offers more reliability at the cost of additional latency due to its serialization features.

Citations:
[1] https://community.intel.com/t5/Software-Tuning-Performance/High-impact-of-rdtsc/td-p/1092539
[2] http://blog.tinola.com/?e=54
[3] https://github.com/google/randen/blob/master/nanobenchmark.cc
[4] https://community.intel.com/t5/Intel-ISA-Extensions/Intrinsic-functions-rdtsc-and-rdtscp/m-p/1170426/highlight/true
[5] https://en.wikipedia.org/wiki/RDTSC
[6] https://cseweb.ucsd.edu/classes/wi16/cse221-a/timing.html
[7] https://www.felixcloutier.com/x86/rdtscp
[8] https://www.youtube.com/watch?v=dPeAK62Pyi4