Historically CPUs have been perfectly sequential machines.
This is highly logical and easy to understand but can be a performance issue.
One of the more interesting ones, though, is out-of-order execution.
In this case, the CPU must find that value in memory.
The CPU cache is checked first as these are the fastest memory tier.
If the value isnt there, the system RAM is checked.
The L2 cache may take 20 cycles, L3 around 200 cycles, and system RAM around 400 cycles.
Through this reordering, it can choose to prioritize specific threads over others.
This prevents pipeline stalls as much as possible, minimizing idle cycles.
Out-of-order execution requires a feature called register renaming.
The CPU can access data held in registers within a single cycle.
Registers are used to store data being read and written.
To enable this, CPUs have many more logical registers than the CPU architecture demands.
This data isnt transferred to another register when the order has sorted itself out.
Instead, the name of the holding register is changed to that of the register it should be in.
These logical registers are entirely unaddressed.
The CPU can only really address the logical registers that currently share the name of the architectural registers.
It allows a programmer to enforce an ordering constraint on memory operations issued before and after the memory barrier.
This is done to ensure that important operations are completed in the correct order.
Generally, on modern computers, this shouldnt be necessary.
Out-of-order execution and registry renaming are well-established and mature fields.
Memory barriers may come with some performance detriment.
This is because they actively prevent the CPU scheduler from optimizing specific parts of the instruction flow.
This increases the chance of a pipeline stall.
Conclusion
A memory barrier is an instruction that ensures an ordering constraint on memory operations.
This is important because out-of-order execution processors may reorder specific instructions.
The memory barrier forces the CPU scheduler to ensure that instructions are completed before any instruction after the barrier.
This prevents memory operations from being reordered.
It also prevents the CPU from optimizing the instruction flow, which can impact performance.