The alternative is to just make the deviation smaller, something like
Cuts the size of those deviations to essentially 0 by comparison and is trivially easy to verify correctness of compared to dealing with specialized instructions for different processors etc.
Also suddenly the probability of generating 0.0f0 is much much lower, since Float32 can represent ldexp(1.0f0,-74) = 5.29e-23 so the probability of 0.0 is now about that much, and it’s only observable by a million cores operating for 218 days.
The bias essentially goes away in terms of observability because it becomes so much smaller. And the cost of doing it is 1.35 times the cost of rand(Float32)
julia> @btime(rand32fallback()) ; @btime(rand(Float32))
3.837 ns (0 allocations: 0 bytes)
2.845 ns (0 allocations: 0 bytes)
0.0863567f0
julia> 3.837/2.845
1.3486818980667838