What is an atomic instruction?
LINK: https://www.quora.com/What-is-an-atomic-instruction
This term is applied to instructions that execute two or more separate accesses to memory using two or more separate bus cycles. It is possible that another device executes a bus cycle between these two and the final content of the memory referenced in the operation becomes erroneous.
The x86 ISA includes several instructions of this type. All of them execute a read-modify-write operation. Example:
00 00 ADD [BX+SI],AL
This instruction reads the memory addressed by the BX and SI registers, adds the content of AL and writes the result back. An interrupt cannot intervene in this sequence but another device (another CPU or DMA) can take the bus between the read and the write and overwrite the content. As a result the final content will be wrong from the perspective of both processors.
To ensure correct result, the CPU must guarantee atomicity: no other device overwrites the memory location between the read starts and the write finishes. Another device may still overwrite the memory location but only either before the read operation or after the write operation, never between them and finally the content will still be consistent.
Atomicity is especiay critical when the variable to be modified is a mutex, counting semaphore or similar piece of data that controls access to shared resources. Failure to ensure atomicity may result in two processors accessing a shared resource at the same time, permanent lock-up or similar disastrous behaviour.
There are two approaches to the problem.
First is implemented by x86 starting from its very first incarnation in 1978 by I8086.
So how to make an instruction atomic
1) Adding the LOCK prefix to any read-modify-write operation makes it atomic:
F0 00 00 LOCK ADD [BX+SI],AL
Here LOCK means lock the bus for exclusive use so no one else can access it.
2) The other approach is applied by ISAs that have no read-modify-write instructions at alk and require at least three instructions to atchieve it.
Example is ARM (and RISC in general). They have no atomic instructions per se, but have special load and store instructions that check atomicity violations between them.
retry:
E85C 0F00 LDREX R0,[R12]
1842 ADDS R2,R0,R1
E84C 2300 STREX R3,R2,[R12]
B103 CBZ R3,return
E7F8 B retry
return:
(here the EX in LDREX and STREX stands for exclusive)
Both approaches introduce implementation penalty and both have their pros and cons. The ARM approach involves at least four instructions, two additional registers and a loop; interrupts may have to be disabled before the sequence. The x86 approach limits the atomic operations to some of the available read-modify-write instructions: there is no way to guarantee atomicity if the operation requires more than one instruction and surprisingly, some read-modify-write instructions, shift and rotate in particular, do not allow LOCK. LOCK is not allowed with any other instruction as well. This may compromise consistency when reading from or writing to unaligned address, which requires again two bus cycles.
And finally, there are antiquated ISAs that were never meant to be used in multiprocessor configurations: they cannot guarantee atomicity at all.