Early computers were entirely sequential.

Each of these stages should take one cycle to complete.

This is a lot slower, adding dozens or hundreds of clock cycles of latency.

Article image

In the meantime, everything else needs to wait as no other data or instructions can be processed.

This bang out of processor design is called subscalar as it runs less than one instruction per clock cycle.

In a subscalar processor with no pipeline, each part of each instruction is executed in order.

Article image

This design can only ever achieve less than one instruction completed per cycle.

Contents

Pipelining to scalar

A scalar processor can be achieved by applying a system pipeline.

In a scalar pipelined processor, each stage of an instructions execution can be performed once per clock cycle.

Article image

This allows a maximum throughput of one completed instruction per cycle.

In reality, programs are complex and reduce the throughput.

It actually runs the program in the wrong order to fix this.

Article image

This works, because many instructions dont necessarily rely on the result of a previous one.

This way two instructions can be in each stage of the pipeline in every cycle.

This allows for a maximum instruction throughput of more than one completed instruction per cycle.

The performance increase from increasing pipelines only scales so far efficiently though.

Thermal and size constraints place some limits.

There are also significant scheduling complications.

Some instructions can have two potential outcomes, each one leading to different following instructions.

A simple example would be an if statement.

If this is true do that, otherwise do this other thing.

A branch predictor attempts to predict the outcome of a branching operation.

It then pre-emptively schedules and executes the instructions following what it believes to be the likely outcome.

Thus high-prediction success rates can increase performance noticeably.

Early computers were entirely sequential, running only one instruction at a time.

This meant that each instruction took more than one cycle to do complete and so these processors were subscalar.

It should be noted that no individual instruction is fully processed in a single clock cycle.

It still takes at least five cycles.

Multiple instructions, however, can be in the pipeline at once.

This allows a throughput of one or more completed instructions per cycle.

Superscalar should not be confused with hyperscaler which refers to companies that can offer hyperscale computing resources.

This is typically found in large data centres and cloud computing environments.