Rand-based Function Optimization

Given a non-zero UInt8, I would like to randomly and uniformly select the index of a bit that is one. For example, if I get UInt8(14), the bitstring would be 00001110, and thus it would return UInt8(2), UInt8(3), and UInt8(4) at a uniform rate. My solution thus far has been

  1. Select a number from one to the number of ones in the inputted UInt8
  2. Find the index of the bit corresponding to that many ones (from the right)

This second step is similar to the bit twiddling hack shown here: Bit Twiddling Hacks but this is built very much with 64-bit integers in mind so I spun my own. Here’s my solution so far:

const rng = Xoshiro()

# step 2
@inline function get_decisions(d1::UInt8, d2::UInt8, i::UInt8, j::UInt8)
    to_return_1, to_return_2 = 0x08, 0x08
    for n = 0x00:0x06
        to_return_1 -= isodd(d1 >> n) &&
            Base.ctpop_int(d1 >> n) == i ? 0x07 - n : 0x00
        to_return_2 -= isodd(d2 >> n) &&
            Base.ctpop_int(d2 >> n) == j ? 0x07 - n : 0x00
    end
    return to_return_1, to_return_2
end

# step 1
function select_random_decision(d1::UInt8, d2::UInt8)
    a, b = Base.ctpop_int(d1), Base.ctpop_int(d2)
    return get_decisions(d1, d2, 
        a == 0x01 ? 0x01 : 
            a == 0x02 ? rand(rng, (0x01, 0x02)) : 
            a == 0x03 ? rand(rng, (0x01, 0x02, 0x03)) : 
            a == 0x04 ? rand(rng, (0x01, 0x02, 0x03, 0x04)) : rand(rng, 0x01:a),
        b == 0x01 ? 0x01 : 
            b == 0x02 ? rand(rng, (0x01, 0x02)) : 
            b == 0x03 ? rand(rng, (0x01, 0x02, 0x03)) : 
            b == 0x04 ? rand(rng, (0x01, 0x02, 0x03, 0x04)) : rand(rng, 0x01:b)
    )
end

This can be checked for correctness visually with this plot:

using Plots
a = rand(0x01:0xff)
b = rand(0x01:0xff)
N = 1_000_000
results = map(x -> select_random_decision(a, b), 1:N)
histogram(map(x -> results[x][1], 1:N) .- 1/8, xlims = (0, 9), label = "",
    xticks = (1:8, reverse(split(bitstring(a), "")) .* reverse(split(bitstring(b), ""))), 
    bar_width = 1/4)
histogram!(map(x -> results[x][2], 1:N) .+ 1/8, label = "", bar_width = 1/4, 
    dpi = 300, format = :png, grid = false, yaxis = false, yticks = [], 
    bgcolor = :transparent)

and benchmarked with this:

using BenchmarkTools
@benchmark select_random_decision(a, b) setup = (a = rand(0x01:0xff); b = rand(0x01:0xff))
BenchmarkTools.Trial: 
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     10.060 ns (0.00% GC)
  median time:      14.625 ns (0.00% GC)
  mean time:        14.621 ns (0.00% GC)
  maximum time:     22.273 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     999

This blog post (What you can learn from accessing an element of an array in Julia) mentions that tuples >= 4 in length are highly optimized for random selection in Julia. This lines up with my benchmarking, as both going to 3 and 5 are slower. However, I don’t understand why this is, since the base Julia seems to have optimizations for 1, 2, 3, and N-sized tuples (lines 500-553 of https://github.com/JuliaLang/julia/blob/0ff49d0457300013f9e802850a49e64569426a0b/stdlib/Random/src/generation.jl).

Why do these tuple optimizations work up to 4, can I write my own implementations for 5-8 (8 seems not that far away from 4, perhaps naively), and are there other ways to optimize this function?

The optimization is specifically for tuples of lenght 1, 2, 3, and powers of 2 (including 4), which were easy enough to implement. For other lengths, you can definitely try to write optimizations and make a PR to Random. But since Julia 1.5, a more efficient algorithm is used to sample from an array (and from tuples of lengths not listed above), so it’s not obvious that specializations for tuples of length 5, 6, 7 can be faster than the generic algo (even the optimization for tuples of length 3 became close to irrelevant since Julia 1.5).

1 Like