Computers are complex machines with no part more complex than the CPU.
At a basic overview level, it seems like the CPU should be relatively simple.
It takes a series of commands, processes them, and then outputs the data.
This bears little resemblance to the actual workings of modern CPUs though.
Contents
Sub-scalar to super-scalar
Early CPUs were exactly as youd expect.
CPUs of this throw in were sub-scalar, able to complete less than one instruction per clock cycle.
CPU designers identified that there were many different stages of completing an instruction.
Each of these stages required different hardware.
In any sort of processor, idle hardware is useless hardware.
To utilise this idle hardware, CPU designs were updated to use a pipeline approach.
This made CPUs scalar.
To be able to do more, processors needed to be made super-scalar.
To achieve this, multiple parallel pipelines were implemented.
Keeping pipelines fed with data
The main performance issue with computers is typically memory latency.
Traditionally, the answer was just to stall and wait for it to become available.
This leaves the whole pipeline empty, potentially for hundreds of CPU cycles.
While CPU cache memory can help address this issue, it still cant fix it.
A new paradigm was needed to solve it.
That paradigm shift was Out Of Order Execution or OOO.
The first stage of a pipeline is to decode the instruction.
In an OOO CPU, decoded instructions are added to a queue.
They are only removed from the queue and actually processed when the data they need is available.
Critically, it doesnt matter what order the instructions were added to the queue.
Critical dependencies
This process assumes two things.
First of all, that it is possible to reliably identify and handle true dependencies.
Secondly that you’re able to reliably handle and identify false dependencies.
What is the difference?
Well, a true dependency is a dependency that cant be mitigated at all in an OOO system.
The easiest example is the read-after-write.
They must be completed in the order in which they were presented, or youll get nonsense data.
A false dependency is one that can be hidden with another clever trick.
Lets take the example of write-after-read.
At first glance, you might think that you cant overwrite data before youve read it.
Things arent that simple though.
This is the process of register renaming and it is critical to OOO processing.
Typically, an instruction set defines a set number of architectural registers that are used in the system.
You literally cant address any others.
But what if you do overprovision registers?
A real-world analogy would be hot-desking.
This is done on the basis of the earliest issued instructions that have data available.
OOO execution relies on register renaming to hide false dependencies.