50x speed difference in gemv for different values in vector

Just to elaborate on that, subnormals (aka denormals) are floating-point values with large negative exponents – so large that they no longer use all the bits of the value and have less than full precision. This allows something known as “gradual underflow” where you lose precision gradually, instead of immediately getting a zero value. Doing arithmetic with subnormal values does not go through the normal CPU pathways (on Intel hardware) and thus takes considerably longer – i.e. floating-point ops do not take a fixed number of clock cycles, which is what you’re seeing here.

5 Likes