How to create a random binary array with b bits?

I’ll try running a series of benchmarks in the next few months.
My machine runs non-avx/avx/avx512 code at 4.6/4.3/4.1 GHz. The difference between 4.3 and 4.1 GHz isn’t very significant.

However, reading the blog post, it notes

Similarly, if you are using 256-bit heavy instructions in a sustained manner, you will move to L1. The processor does not immediately move to a higher license when encountering heavy instructions: it will first execute these instructions with reduced performance (say 4x slower) and only when there are many of them will the processor change its frequency. Otherwise, any other 512-bit instructions will move the core to L1: the processor stops and changes its frequency as soon as an instruction is encountered.

The 4x slower execution was also observed in this post:

Ctrl + F for “with an IPC of ~0.25” to find it. It also includes graphs showing this behavior.

The slowdown lasted about 9 microseconds.
In other words, if a function takes around that time to run, and you aren’t executing a lot of other AVX512 code (so that the CPU is already in that state), that function will run at 1/4 speed if it uses 512 bit instructions. It will thus almost certainly be faster if it does not, and runs code similar to what the rest of the program is doing, so that the CPU can stay in the same “license”.

When running numerical work, my CPU stays in the AVX512 license (as I can monitor with watch -n0.5 "cat /proc/cpuinfo | grep MHz") , so I’d like to be able to keep it there.

Another concern of mine is that LLVM handles remainders on loops poorly, and it produces twice the remainder when using full-width AVX512. You can look at all the graphs in my opening post from [ANN] LoopVectorization, where Clang and Julia consistently show this pattern of behavior:
image
They would benefit a lot just by having a shorter remainder.
This wasn’t a problem for LoopVectorization, the GNU compilers, or Intel compilers. Just LLVM.
(Both GNU and Intel also avoid AVX512 unless specified with a compilation flag; I did provide the flags for the above example. Meaning they must have decided avoiding it was worth it, despite not having the remainder issue.)

3 Likes