What Is NUMA?

you could also get a powerful GPU.

you’ve got the option to also get GPUs with more capable interconnects and more VRAM.

So, what do you do if you need more than 96 cores in one computer?

Add more CPUs, obviously.

You need specific hardware.

AMD supports the ability for two of their EPYC server CPUs to be placed on the same motherboard.

That offers up a total of 192 cores or 384 threads.

Intels latest server CPUs maxed out at 40 cores, though the previous generation featured a 56-core model.

Intel, however, supports up to 8 CPUs on a single motherboard.

Thats 320 or 448 cores and 640 or 896 threads.

While this is overkilled for checking Instagram, some workloads can use all this horsepower.

The problem comes from memory.

Four things generally limit CPUs.

The first is a lack of things to do; sometimes, the CPU just isnt loaded.

The other limitation is memory access.

A CPU typically needs a lot of data to perform a lot of processing.

All of that is stored in RAM.

Unfortunately, RAM is pretty slow compared to a CPU.

This can leave it idle for ages before it gets the data it needs to operate.

This is why RAM is always located directly next to the CPU socket on a motherboard.

But what happens if you have multiple CPUs on a single motherboard?

Oh no, you might say, some memory is slightly slower.

But this is an actual issue that can have a surprisingly profound effect on performance.

This concept is called Non-Uniform Memory Access, or NUMA.

Similarly, data necessary for a task running on CPU2 is stored in the RAM directly next to CPU2.

Still, best efforts are made and have a significant impact on performance.

Memory access over a single channel is also sequential.

Conclusion

NUMA stands for Non-Uniform Memory Access.

Its a term used in computer systems with multiple physical CPUs.

The extra latency decreases system performance in multiple ways.

NUMA is a way to inform the operating system that this is the case.

It allows it to optimize memory usage and data locality based on the CPU that needs the data.

When the local RAM doesnt have enough capacity, data can spill over into the RAM around other CPUs.

Again where possible, the number of NUMA hops is minimized to reduce latency.