Given a non-zero UInt8, I would like to randomly and uniformly select the index of a bit that is one. For example, if I get UInt8(14), the bitstring would be 00001110, and thus it would return UInt8(2), UInt8(3), and UInt8(4) at a uniform rate. My solution thus far has been

- Select a number from one to the number of ones in the inputted UInt8
- Find the index of the bit corresponding to that many ones (from the right)

This second step is similar to the bit twiddling hack shown here: Bit Twiddling Hacks but this is built very much with 64-bit integers in mind so I spun my own. Here’s my solution so far:

```
const rng = Xoshiro()
# step 2
@inline function get_decisions(d1::UInt8, d2::UInt8, i::UInt8, j::UInt8)
to_return_1, to_return_2 = 0x08, 0x08
for n = 0x00:0x06
to_return_1 -= isodd(d1 >> n) &&
Base.ctpop_int(d1 >> n) == i ? 0x07 - n : 0x00
to_return_2 -= isodd(d2 >> n) &&
Base.ctpop_int(d2 >> n) == j ? 0x07 - n : 0x00
end
return to_return_1, to_return_2
end
# step 1
function select_random_decision(d1::UInt8, d2::UInt8)
a, b = Base.ctpop_int(d1), Base.ctpop_int(d2)
return get_decisions(d1, d2,
a == 0x01 ? 0x01 :
a == 0x02 ? rand(rng, (0x01, 0x02)) :
a == 0x03 ? rand(rng, (0x01, 0x02, 0x03)) :
a == 0x04 ? rand(rng, (0x01, 0x02, 0x03, 0x04)) : rand(rng, 0x01:a),
b == 0x01 ? 0x01 :
b == 0x02 ? rand(rng, (0x01, 0x02)) :
b == 0x03 ? rand(rng, (0x01, 0x02, 0x03)) :
b == 0x04 ? rand(rng, (0x01, 0x02, 0x03, 0x04)) : rand(rng, 0x01:b)
)
end
```

This can be checked for correctness visually with this plot:

```
using Plots
a = rand(0x01:0xff)
b = rand(0x01:0xff)
N = 1_000_000
results = map(x -> select_random_decision(a, b), 1:N)
histogram(map(x -> results[x][1], 1:N) .- 1/8, xlims = (0, 9), label = "",
xticks = (1:8, reverse(split(bitstring(a), "")) .* reverse(split(bitstring(b), ""))),
bar_width = 1/4)
histogram!(map(x -> results[x][2], 1:N) .+ 1/8, label = "", bar_width = 1/4,
dpi = 300, format = :png, grid = false, yaxis = false, yticks = [],
bgcolor = :transparent)
```

and benchmarked with this:

```
using BenchmarkTools
@benchmark select_random_decision(a, b) setup = (a = rand(0x01:0xff); b = rand(0x01:0xff))
```

```
BenchmarkTools.Trial:
memory estimate: 0 bytes
allocs estimate: 0
--------------
minimum time: 10.060 ns (0.00% GC)
median time: 14.625 ns (0.00% GC)
mean time: 14.621 ns (0.00% GC)
maximum time: 22.273 ns (0.00% GC)
--------------
samples: 10000
evals/sample: 999
```

This blog post (What you can learn from accessing an element of an array in Julia) mentions that tuples >= 4 in length are highly optimized for random selection in Julia. This lines up with my benchmarking, as both going to 3 and 5 are slower. However, I don’t understand why this is, since the base Julia seems to have optimizations for 1, 2, 3, and N-sized tuples (lines 500-553 of julia/generation.jl at 0ff49d0457300013f9e802850a49e64569426a0b · JuliaLang/julia · GitHub).

Why do these tuple optimizations work up to 4, can I write my own implementations for 5-8 (8 seems not that far away from 4, perhaps naively), and are there other ways to optimize this function?