Further data for the same problem. This machine:
4 socket(s), 64 cores, AMD Opteron™ Processor 6380,
RAM = 252 GB, Cache: L3 = 6 MB L2 = 2048kiB L1 = 16 KB
Again, the thread-based loop.
Edit: Extended to more threads.
Further data for the same problem. This machine:
4 socket(s), 64 cores, AMD Opteron™ Processor 6380,
RAM = 252 GB, Cache: L3 = 6 MB L2 = 2048kiB L1 = 16 KB
Again, the thread-based loop.
Edit: Extended to more threads.