Fill SVector with uniformly sampled Float32 values

There are quite a few topics on this, eg: Performance of generating uniform Float32 numbers

Given the current default behavior hasn’t been fixed, I tried a few things.

As a baseline, these all return Float64:

Edit: Bad benchmarks with length 2
julia> @btime rand(Uniform(-.003, .003), 16);
  132.099 ns (1 allocation: 192 bytes)

julia> @btime rand(Uniform{Float64}(-.003, .003), 2);
  51.484 ns (1 allocation: 80 bytes)

julia> @btime rand(Uniform{Float32}(-.003, .003), 2);
  51.708 ns (1 allocation: 80 bytes)

julia> @btime @SVector rand(Uniform(-.003, .003), 2);
  6.056 ns (0 allocations: 0 bytes)

julia> @btime @SVector rand(Uniform{Float64}(-.003, .003), 2);
  6.050 ns (0 allocations: 0 bytes)

julia> @btime @SVector rand(Uniform{Float32}(-.003, .003), 2);
  6.471 ns (0 allocations: 0 bytes)

The SVector approaches are much faster. Then the code at the above link was modified thus:

Struct/Func Definitions
struct RangeFloats
   rangebegin::Float32
   rangelength::Float32
end

@inline function generate_uniform(rf::RangeFloats)
    vec = @SVector rand(Float32, 16)
    @reset vec = vec .* rf.rangelength .+ rf.rangebegin
   return vec
end


@inline function generate_uniform2()
    vec = @SVector rand(Float32, 16)
    @reset vec = vec .* 0.006f0 .+ 0.003f0
   return vec
end

rf = RangeFloats(-0.003, 0.006);

These return Float32:

julia> @btime SVector{16, Float32}(Float32.(rand(Uniform{Float32}(-.003, .003), 16)));
  169.113 ns (2 allocations: 320 bytes)

julia> @btime generate_uniform($rf);
  26.129 ns (0 allocations: 0 bytes)

julia> @btime generate_uniform2();
  26.244 ns (0 allocations: 0 bytes)
  1. Is there any way faster than generate_uniform($rf)?
    – Presumably ~4x is possible given the Float64 benchmarks

  2. I was surprised hard-coding the uniform distribution parameters was (slightly but consistently) slower than passing a struct, ie generate_uniform vs generate_uniform2. Is there a general reason for that?

Thanks.

Edit
Correct Float64 benchmarks (all of length 16):

julia> @btime rand(Uniform(-.003, .003), 16);
  131.719 ns (1 allocation: 192 bytes)

julia> @btime rand(Uniform{Float64}(-.003, .003), 16);
  132.685 ns (1 allocation: 192 bytes)

julia> @btime rand(Uniform{Float32}(-.003, .003), 16);
  131.877 ns (1 allocation: 192 bytes)

julia> @btime @SVector rand(Uniform(-.003, .003), 16);
  36.240 ns (0 allocations: 0 bytes)

julia> @btime @SVector rand(Uniform{Float64}(-.003, .003), 16);
  36.247 ns (0 allocations: 0 bytes)

julia> @btime @SVector rand(Uniform{Float32}(-.003, .003), 16);
  36.229 ns (0 allocations: 0 bytes)

I’m not an expert in numerics, but what about

julia> @b SVector{2,Float32}(rand(Uniform{Float32}(-.003, .003)) for _ in 1:2)
5.541 ns

That vector is length 2 instead of 16 though.

Right, it’s not faster than what you have. What makes you think that one can make it 4x faster? Currently it’s as fast at with Float64.

1 Like

Ah, I messed up the benchmarks. One minute.

Edit:
Yep, thanks. The modified functions are fastest then.

@reset doesn’t do anything useful here. At best, it’s a no-op, at worst, it does extra work that slows you down. You can simply remove it.

1 Like