Data access is a critical part of CPU design.
The vast majority of that data is stored on the storage media.
Storage devices, however, are impossibly slow compared to a CPU.
Typically, these are referred to as the L1, L2, and L3 caches.
L1 is really high speed, typically taking on the order of 5 clock cycles to access.
L2 is a bit slower, on the order of 20 cycles.
L3 is even slower still at around 200 cycles.
While L1 is incredibly fast, its also tiny.
Much of its speed comes from the fact that smaller caches take less time to search.
L2 is bigger than L1 but smaller than L3 which is smaller still than system RAM.
Balancing the size of these caches well is critical to getting a high-performance CPU.
There are three approaches.
The first is to limit a cache to a single core.
you might also allow all cores to enter the cache.
The final option is a middle ground of letting a selection of cores share cache.
Sharing is slow
A cache that is only accessible by a single core is called local memory.
limiting access to the cache means that you dont need to position it for multiple access.
This means you could keep it as close as possible.
This as well as small capacities being faster make up an ideal L1 cache.
Each core has its own small and close cache.
Shared memory, would be a cache accessible by multiple cores.
Like with a local cache being small, it makes sense for a shared cache to be large.
This makes this concept more useful for L2 and especially L3 caches.
Local cache memory doesnt need to be restricted to CPUs.
The concept can also apply to other types of processors.
The most well-known secondary processor, however, is the GPU which essentially doesnt have any local memory.
There are so many processing cores, that everything is grouped up.
Even the smallest group shares the lowest levels of cache.
At the RAM level
Some computers, such as cluster computers can have multiple physical CPUs.
Typically, each of these will have its own pool of RAM.
At the software level
Software running on the computer is allocated memory space.
In some cases, one program may be running multiple processes with a shared memory space.
Some programs may even actively share memory space with another deliberately.
Typically, though, this memory space is limited to just that one process.
Again, this is an example of local memory.
That thig may be a processing core, processor, or process.
The overall concept is always the same though, even I the specifics vary.
Local memory tends to be more secure.
It also tends to be smaller in capacity.
Access times are generally faster for local memory than for shared memory.
Outside of caching though this relies on you measuring the worst-case speed of the shared memory.
local memory is typically very useful.
Except for caches, where it is always better to combine local and shared memory.