@btime of cis Function slower than writing it again

It looks like it has to do with argument reduction.

The sincos function, along with other trig functions, is fastest for arguments in [-\pi/4, \pi/4], and otherwise has to reduce the argument modulo π/2 to that range. For your arguments distributed uniformly in [0,2\pi), computing \pi/2 - \phi increases the probability of the argument being in [-\pi/4, \pi/4], and hence speeds it up on average.

If you do a = Float32.((pi/2) .* rand(sz...)), then \pi/2 - \phi does not change the distribution of magnitudes and hence the two functions become about equally fast on my machine.

7 Likes