The reason for the 12x speed difference has been explained multiple times and is very simple: if you dont use the sin
compiler intrinsic, any of the used compilers wasn’t able to infer that sin
is pure and therefore doesn’t hoist it out of the loop… Btw, Wolf also run into that when he tried to use xsimd::sin Julia’s sin
is implemented in pure Julia, which makes it have that problem by default, but is also easily solvable by calling into the llvm sin
intrinsic.
6 Likes