The purpose of a CPU is to perform instructions.

Historically, early CPUs would identify the next instruction in the queue to be completed.

The CPU would then run through all the processing needed to complete that instruction.

Only once the instruction had been fully processed could the next one be acquired from the queue.

This design led to several inefficiencies that have been addressed in modern CPUs.

One of the biggest inefficiencies was addressed by pipelining.

Contents

Classic RISC Pipeline

In any CPU, there are multiple different parts of executing an instruction.

In that, there are five pipeline stages.

Instruction Fetch is the first stage.

It retrieves the instruction to be executed.

Instruction Decode is the second stage; it decodes the retrieved instruction to identify what needs to be done.

Execute is the third stage; its where the computation defined by the instruction is performed.

Memory Access is the fourth stage, where memory can be accessed.

It also acts as a buffer to ensure that one- and two-cycle instructions stay aligned in the pipeline.

The final stage is Write Back, where the computation results are written to the destination register.

A standard sequential CPU may not necessarily correctly separate these functions as definitively as the classic RISC pipeline.

However, it still needs to perform the same sort of tasks in the same order.

The thing is, the silicon required on a CPU to complete each of these functions is separate.

This is called pipelining.

Benefits of Pipelining

The single biggest benefit of pipelining is a massive throughput gain.

I assume that each instruction takes one clock cycle to go through a stage.

In a sequential pipeline, the CPU could process one instruction every five cycles.

In a pipelined CPU, each instruction still takes five cycles to be completed.

Five instructions are in different stages of being processed simultaneously, though.

One instruction is completed every cycle (in a best-case scenario).

In this way, pipelining offers a significant performance increase.

Pipelining is very similar in concept to a production line in a factory.

This eats into the silicon budget available for the data processing CPU parts.

Sequential CPUs are always subscalar.

Pipelined CPUs can be scalar, though pipeline stalls and incorrect branch prediction can reduce their performance to subscalar.

Its possible to increase performance to superscalar, completing more than one instruction per cycle.

To reach superscalar performance, a CPU needs to double up on hardware.

In this case, there are essentially two or more pipelines side-by-side.

For example: consider a three-stage pipeline.

The first two stages take one cycle each to complete, but the last stage takes two cycles.

This limits the overall throughput to one instruction every two cycles.

Adding duplicate hardware for the previous stage can achieve the overall throughput of one instruction per cycle.

The concept is identical to adding a parallel workstation on a production line.

Conclusion

A CPU pipeline refers to the separate hardware required to complete instructions in several stages.

Critically, each of these stages is then used simultaneously by multiple instructions.

The concept is analogous to a production line in a factory with various workstations for different functions.

There are some extra hardware and complexity requirements but significant performance benefits.

Performance can be further increased by having parallel pipelines, though the hardware requirements for this are even higher.

All modern CPUs utilize pipelines.