It looks like it has to do with argument reduction.
The sincos function, along with other trig functions, is fastest for arguments in [-\pi/4, \pi/4], and otherwise has to reduce the argument modulo π/2 to that range. For your arguments distributed uniformly in [0,2\pi), computing \pi/2 - \phi increases the probability of the argument being in [-\pi/4, \pi/4], and hence speeds it up on average.
If you do a = Float32.((pi/2) .* rand(sz...)), then \pi/2 - \phi does not change the distribution of magnitudes and hence the two functions become about equally fast on my machine.