I found a quite strange effect when evaluating the performance of `cis `

via @btime:

```
sz = (2000,2000)
a = Float32.(2pi .* rand(sz...));
function cis_fast(phi::T) where {T<:Real}
complex(sincos(T(pi/2) .- phi)...)
end
r = zeros(ComplexF32, sz);
@btime $r .= cis_fast.($a); # 42 ms
@btime $r .= cis.($a); # 46 ms
```

For the function as written, I consistently get faster results as the inbuilt version. I find this odd, since the inbuilt version looks like it requires fewer calculations. Tested in Julia 1.7.1 and 1.8.0.

It looks like it has to do with argument reduction.

The `sincos`

function, along with other trig functions, is fastest for arguments in [-\pi/4, \pi/4], and otherwise has to reduce the argument modulo π/2 to that range. For your arguments distributed uniformly in [0,2\pi), computing \pi/2 - \phi increases the probability of the argument being in [-\pi/4, \pi/4], and hence speeds it up on average.

If you do `a = Float32.((pi/2) .* rand(sz...))`

, then \pi/2 - \phi does not change the distribution of magnitudes and hence the two functions become about equally fast on my machine.

6 Likes

Fantastic. I did not think of this, but it explains it. Good to know that that range is optimal.