I’m trying to generate Float32’s from a uniformly from a range, and found Normal from distributions doesn’t offer great performance compared to just stretching and shifting what rand(Float32) already produces. Am I doing something wrong?
mutable struct RangeFloats
rangebegin::Float32
rangelength::Float32
end
mutable struct RangeDistribution
d::Uniform{Float32}
end
@inline function generate_uniform(rf::RangeFloats)
return rand(Float32)*rf.rangelength + rf.rangebegin
end
function sampleuniform(rf::RangeFloats)
cum = 0f0
for _ in 1:10000
cum += generate_uniform(rf)
end
return cum
end
function sampleuniform(rd::RangeDistribution)
cum = 0f0
for _ in 1:10000
cum += Float32(rand(rd.d))
end
return cum
end
Here I find sampleuniform(::RangeFloats) has better performance.
It always returns a Float64 even if the endpoints are Float32, which causes the random-number generation and the arithmetic to be more expensive than necessary. This explains about 2/3 of the peformance difference on my machine (as can be verified by manually implementing a generate_uniform(u::Uniform{T}) where {T} = rand(float(real(T)))*(u.b - u.a) + u.a method and calling that instead of rand). It is a known issue that was reported recently: Inconsistent Type Behavior for Uniform Distribution · Issue #1783 · JuliaStats/Distributions.jl · GitHub
Uniform stores the endpoints [a,b), so it has to compute b-a on each call. This explains the remaining 1/3 of the cost difference on my machine.
Yes I looked around and saw indeed that rand(::Uniform{Float32}) returning a Float64 has been a longstanding problem. Wouldn’t this easily be solved in the current implementation by just adding a method rand(rng::AbstractRNG, d::Uniform{T}) where T = d.a + (d.b - d.a) * rand(rng, T). For the second it might be useful to either just store the length instead of the endpoint, or add an extra field that is computed on construction?
These seem like very obvious fixes to me, so I’m assuming these have been considered already. Thus I also assume the situation is a bit more complicated than I think. I’d be interested to know why this has not been ‘fixed’ yet.
The reason is almost invariably lack of time. If you feel like making a PR, I think both the additional method as well as the cache would make sense, as it could also be used in other methods. Otherwise I can give it a shot later.