Why are my linear interpolations 10x faster in MATLAB?

It’s a cpu-cache issue. Memory is transferred between the cpu and memory in chunks of 64 consecutive bytes or so (a cache line). Each such transfer is a “cache fill”. If possible you should use all the bytes in a cache line before you advance to the next. That is what you do in x2, but in x1 only half of them are used. So you need twice as many cache fills (which may take a significant number of cpu clock cycles, like 50 or 100 or 300 depending on your cpu-architecture. Then “prefetching” is important, but may not alleviate the problem entirely).

5 Likes