Data access is a critical part of CPU design.
The vast majority of that data is stored on the storage media.
Storage devices, however, are impossibly slow compared to a CPU.
To further reduce the latency, most modern CPUs include tiers of cache memory.
Typically, these are referred to as the L1, L2, and L3 caches.
L1 is really high speed, typically taking on the order of 5 clock cycles to access.
L2 is a bit slower, on the order of 20 cycles.
L3 is even slower still at around 200 cycles.
While L1 is incredibly fast, its also tiny.
Much of its speed comes from the fact that smaller caches take less time to search.
L2 is bigger than L1 but smaller than L3 which is smaller still than system RAM.
Balancing the size of these caches well is critical to getting a high-performance CPU.
Contents
Scratchpad memory
Note that scratchpad memory doesnt fit in the traditional memory hierarchy.
Thats because it isnt used in most consumer CPUs.
Scratchpad memory is designed to be used like a scratchpad would be in real life.
You note down temporary information that you should probably remember but dont need to actually file away.
Much of the time a CPU processes data and then needs that result again straight away.
Scratchpad memory essentially fills the same gap as the L1 cache.
Its accessible as fast as possible, often in single-digit cycle counts.
To manage this, it is also relatively small.
There are two key differences between L1 and scratchpad memory, though.
Firstly, scratchpad memory is directly addressable.
Secondly, it is shared between all cores and processors.
Allowing the scratchpad to be addressable means that code can specify exactly what data should be in the scratchpad.
L1 cache is always locked to an individual processing core.
No other processing core can access it.
L3 tends to be shared by all cores.
Sharing cache between cores allows two or more cores to get into the same data without duplicating it.
This allows very fast access to specific data being acted on in a multithreaded workload.
Scratchpad memory can even be shared between distinct CPUs on multi-socket motherboards.
One disadvantage that scratchpad memory has is that it may be relied on too heavily.
Being able to access it directly, software may rely on its presence in certain quantities.
In this case, it would then be incapable of running on CPUs without that much scratchpad memory.
Cache tiers simply dont suffer from this problem and so are better suited to general-purpose use.
There, its combination of speed and shared access makes it useful for highly parallel workloads.
Scratchpad memory also sees use in much smaller processors.
Embedded processors, often MPSoCs.
An embedded processor is often relatively low power and specialised for a specific task.
This specialisation is often represented in hardware optimisations.
These sorts of CPUs are often very fixed in design.
Instead of being a cache its directly addressable allowing data to be specifically assigned to particularly high-speed memory.
Its also shared between all processor cores and processors, making it particularly useful in heavily multithreaded workloads.