Fill SVector with uniformly sampled Float32 values

There are quite a few topics on this, eg: Performance of generating uniform Float32 numbers

Given the current default behavior hasn’t been fixed, I tried a few things.

As a baseline, these all return Float64:

Edit: Bad benchmarks with length 2
julia> @btime rand(Uniform(-.003, .003), 16);
  132.099 ns (1 allocation: 192 bytes)

julia> @btime rand(Uniform{Float64}(-.003, .003), 2);
  51.484 ns (1 allocation: 80 bytes)

julia> @btime rand(Uniform{Float32}(-.003, .003), 2);
  51.708 ns (1 allocation: 80 bytes)

julia> @btime @SVector rand(Uniform(-.003, .003), 2);
  6.056 ns (0 allocations: 0 bytes)

julia> @btime @SVector rand(Uniform{Float64}(-.003, .003), 2);
  6.050 ns (0 allocations: 0 bytes)

julia> @btime @SVector rand(Uniform{Float32}(-.003, .003), 2);
  6.471 ns (0 allocations: 0 bytes)

The SVector approaches are much faster. Then the code at the above link was modified thus:

Struct/Func Definitions
struct RangeFloats
   rangebegin::Float32
   rangelength::Float32
end

@inline function generate_uniform(rf::RangeFloats)
    vec = @SVector rand(Float32, 16)
    @reset vec = vec .* rf.rangelength .+ rf.rangebegin
   return vec
end


@inline function generate_uniform2()
    vec = @SVector rand(Float32, 16)
    @reset vec = vec .* 0.006f0 .+ 0.003f0
   return vec
end

rf = RangeFloats(-0.003, 0.006);

These return Float32:

julia> @btime SVector{16, Float32}(Float32.(rand(Uniform{Float32}(-.003, .003), 16)));
  169.113 ns (2 allocations: 320 bytes)

julia> @btime generate_uniform($rf);
  26.129 ns (0 allocations: 0 bytes)

julia> @btime generate_uniform2();
  26.244 ns (0 allocations: 0 bytes)
  1. Is there any way faster than generate_uniform($rf)?
    – Presumably ~4x is possible given the Float64 benchmarks

  2. I was surprised hard-coding the uniform distribution parameters was (slightly but consistently) slower than passing a struct, ie generate_uniform vs generate_uniform2. Is there a general reason for that?

Thanks.

Edit
Correct Float64 benchmarks (all of length 16):

julia> @btime rand(Uniform(-.003, .003), 16);
  131.719 ns (1 allocation: 192 bytes)

julia> @btime rand(Uniform{Float64}(-.003, .003), 16);
  132.685 ns (1 allocation: 192 bytes)

julia> @btime rand(Uniform{Float32}(-.003, .003), 16);
  131.877 ns (1 allocation: 192 bytes)

julia> @btime @SVector rand(Uniform(-.003, .003), 16);
  36.240 ns (0 allocations: 0 bytes)

julia> @btime @SVector rand(Uniform{Float64}(-.003, .003), 16);
  36.247 ns (0 allocations: 0 bytes)

julia> @btime @SVector rand(Uniform{Float32}(-.003, .003), 16);
  36.229 ns (0 allocations: 0 bytes)

I’m not an expert in numerics, but what about

julia> @b SVector{2,Float32}(rand(Uniform{Float32}(-.003, .003)) for _ in 1:2)
5.541 ns

That vector is length 2 instead of 16 though.

Right, it’s not faster than what you have. What makes you think that one can make it 4x faster? Currently it’s as fast at with Float64.

Ah, I messed up the benchmarks. One minute.

Edit:
Yep, thanks. The modified functions are fastest then.

@reset doesn’t do anything useful here. At best, it’s a no-op, at worst, it does extra work that slows you down. You can simply remove it.