In almost any computer program, there are parts of the code that branch into separate paths.
For example, an if-then-else statement has two possible outcomes.
These statements dont present any issue to sequential processors, as the CPU processes every command in order.
Branches present a big problem for pipelined processors, as multiple instructions are being executed at once.
The processing required to complete either option will be different.
Otherwise, the program wouldnt branch.
There are also branching statements that form a loop.
While these might not be as numerically common as single-use statements, they are generally statistically repeated.
Why Is This a Problem?
It doesnt matter how this problem is phrased in a fully sequential processor.
It simply isnt a problem.
Which branch will be taken is known before the first part of the following instruction is processed.
In a pipelined processor, however, the following instructions are queued up.
They are already being processed when the processor knows which branch is being taken.
So how should the processor handle this scenario?
There are a few options.
Alternatively, you could try running both options in the pipeline and discard the wrong choice.
So its outcome is known before its needed.
This option could be helpful, except it isnt scalable.
Suppose you have a relatively high density of branching statements.
How Is This Issue Really Addressed
In reality, the processor includes a branch predictor.
The branch predictor attempts to guess which result of a branching choice will be taken.
The processor then assumes that the prediction is correct and schedules instructions.
If the branch predictor is accurate, there is no performance penalty.
If the branch predictor makes a mistake, you must flush the pipeline and start processing the actual result.
This does result in a slightly worse performance penalty than having done nothing and just waited for the result.
To get the best performance, its important to ensure that the branch predictor is as accurate as possible.
There is a range of different approaches to this.
Some early implementations of branch predictors simply assumed that all branches would always or would never be taken.
This requires a lot of tuning and adjusting software development habits.
A saturating counter is a basis for all current branch predictors.
The first guess is decided statically or randomly.
Once a branch has been taken or not taken, that result is stored in a portion of memory.
The next time that same branch comes around, the branch predictor predicts the same result as before.
This naturally results in good prediction rates for loops.
There are two versions of this.
The early versions simply used a single bit of data to indicate if the branch was taken or not.
Later versions use two bits to indicate a weakly or strongly taken or not taken choice.
In that case, a two-level adaptive predictor can identify this pattern and predict it happening again.
This increases success rates even further over a simple saturated counter.
Modern branch predictors build on this further by using a neural data pipe to spot and predict patterns.
A 2-bit saturating branch predictor can still predict a branch is taken, even if it previously wasnt.
This allows it to predict that a loop will be retaken, even after it has been left once.
Some branch predictors use multiple algorithms and then compare the results to decide which prediction to use.
Some systems keep track of each branching instruction separately in whats called local branch prediction.
Others use a global branch prediction system to track all recent branching instructions.
Both have their uses and downsides.
A branch predictor that uses a pattern history table may identify repeating patterns when certain branches are taken.
Such as predicting that a loop is only ever taken three times before leaving it.
Conclusion
A branch predictor is a special part of a pipelined CPU.
It attempts to predict the outcome of a branching instruction before the actual instruction is processed.
They offer no reduction in performance when they are correct.
Modern algorithms can be accurate in relatively predictable workloads as much as 97% of the time.
There is no perfect method of prediction, so implementations vary.
The performance impact of making a wrong prediction depends on the length of the pipeline.
Specifically, the length of the pipeline before it can be determined if the prediction was wrong.
It also depends on how many instructions are in each pipeline stage.