# Computing the preimage of count_ones

`count_ones(x)` counts how many 1s are in the binary representation of the integer `x`

I am looking for a function that computes the preimage of that.
Or informally the inverse (but it’s not actually invertable).
That is to say for for any `n` I want a function that returns an iterator of values `y` such that for all those values `count_ones(y) == n` (and for no other integers).
Up to some maximum `z`.
The native implementation is:

``````inv_count_ones(n, z) = filter(y->count_ones(y)==n, 1:z)
``````

And it does indeed have

``````inv_count_ones(0, 8) == []
inv_count_ones(1, 8) == [1,2, 4, 8]
inv_count_ones(2, 8) == [3, 5, 6]
inv_count_ones(3, 8) == [7,]
inv_count_ones(4, 8) == []
``````

But is there a better way?

1 Like

Here’s one way:

``````using Combinatorics
inv_count_ones(k,n) =
filter(<=(n), [reduce(|,(1 .<< s))
if length(s)>0])
``````

The problem is pretty standard in combinatorics and bit-manipulation. The key feature is to leverage the extra parameters of the `powerset` function.

``````julia> [inv_count_ones(i, 8) for i in 0:4] ==
[[], [1,2,4,8], [3,5,6], , []]
true
``````

BTW this can be transformed into an Iterator for these sets, which can be more efficient in memory (and perhaps time).

There’s definitely a pattern in the positive `Int`s, if you can split the implementation by types and ranges.

``````julia> for i in 1:32
ci = count_ones(i)
if ci == 1 println("\n",ci) else print(" ", ci) end
end

1

1
2
1
2 2 3
1
2 2 3 2 3 3 4
1
2 2 3 2 3 3 4 2 3 3 4 3 4 4 5
1
``````

You can tell `n=1` occurs at powers of 2, and for a given `n` you can start the search at the minimum integer with `n` 1s, `(2^n-1):z`.
There is a pattern between the powers of 2, but not sure how to exploit that for `n>1`

This is a fun exercise in combinatorics. I vaguely remember that this is pretty well-known and standard, but I don’t recall the name, so I’ll instead re-derive the construction here.

edit: @Syx_Pek provided the name down-thread, it’s called combinatorial number system.

Originally I described a re-derivation with code, but read wikipedia instead, it’s clearer than what I wrote.

First of all, suppose you had your array `inv_co(k) = filter(y->count_ones(y)==k, 1: (-1 >>> 1))`, and you’re given a specific `N`, such that `count_ones(N)=k`. What is the index of `N` in this list `inv_co(k)`?

Well, it is one plus the number of smaller entries. This is easy to compute:

``````struct BitsIter
u::Int64
end
Base.iterate(b::BitsIter) = iterate(b, b.u)
Base.iterate(b::BitsIter, u) = (u == 0 ? nothing : (trailing_zeros(u), Base._blsr(u)))

function index(N)
res = 1
for (nbits, bitpos) in enumerate(BitsIter(N))
#there are nbits many bits we need to distribute over bitpos many places
res += binomial(bitpos, nbits)
end
res
end
``````

We can reality-check that:

``````julia> inv_count_ones(n, z) = filter(y->count_ones(y)==n, 1:z)
inv_count_ones (generic function with 1 method)

julia> length(inv_count_ones(5, 1000))
252

julia> map(index, inv_count_ones(5, 1000)) == 1:length(inv_count_ones(5, 1000))
true
``````

OK, now we need to invert the `index` function (you want random access to your iterator / lazy collection of preimages, right?). That is: We are given idx and K, and want `z` such that `count_ones(z) == K` and `index(z) == idx`.

I think the construction is best explained by giving a (slowish recursive) implementation:

``````function inv_index0(k, idx)
k == 1 && return 1<<idx
pos = findfirst(p -> binomial(p, k) > idx, 1:64)
rem = idx - binomial(pos-1, k)
@show k, pos, idx
return (1<<(pos-1)) | inv_index0(k-1, rem)
end
inv_index(k, idx) = inv_index0(k, idx-1)
``````

We can reality check that:

``````julia> for i = 1:1000
z = rand(1:(1<<35))
k = count_ones(z)
if z != inv_index(k, index(z))
@show z, k, i
@assert false
end
end
``````

Remaining parts of the exercise: Make a loop out of the tail-recursion, deal with integer overflows, make it fast (lookup table), optionally make a fast iterator, optionally make a fast container.

This function takes a `BitInteger` and gives the next value of the same type that has the same number of bits:

``````function next_val(k::T) where {T<:Base.BitInteger}
𝟙 = one(T)
z = trailing_zeros(k)
k += 𝟙 << z
d = trailing_zeros(k) - z - 1
k += 𝟙 << d - 𝟙
end
``````

Basically, the logic is that you shift the last bit left by adding a bit at that position; if there’s a cascade of carrying because there’s more than one bit on in a row, then you’ll end up with fewer bits in the resulting value and you need to replace those with a number of least significant bits. Here’s a function collects them into a vector and returns it:

``````function list_vals(::Type{T}, b::Integer) where {T<:Base.BitInteger}
vals = T[]
𝟙 = one(T)
k = 𝟙 << b - 𝟙
while count_ones(k) == b
push!(vals, k)
z = trailing_zeros(k)
k += 𝟙 << z
d = trailing_zeros(k) - z - 1
k += 𝟙 << d - 𝟙
end
return vals
end
list_vals(b::T) where {T<:Base.BitInteger} = list_vals(T, b)
``````

With this definition we get:

``````julia> map(bitstring, list_vals(UInt8(5)))
56-element Vector{String}:
"00011111"
"00101111"
"00110111"
"00111011"
⋮
"11110001"
"11110010"
"11110100"
"11111000"
``````
1 Like

Ah, I just figured out a simpler way to do `next_val` that lets you cycle around back to the beginning once you overflow and also lets this work for arbitrary integer types:

``````function next_val(k::T) where {T<:Integer}
𝟙 = one(T)
b = count_ones(k)
z = trailing_zeros(k)
k += 𝟙 << z
x = b - count_ones(k)
k += 𝟙 << x - 𝟙
end
``````

I think this also makes it more obvious that what you’re doing is repopulating the number of bits that you lots. There’s an extra `count_ones(k)` call at the top, but that’s going to be a loop constant in the loop version; however, it means that the loop condition can’t be based on counting ones anymore, so it’s a bit of a wash. However, you can also precompute the correct array size using the `binomal` function, which is probably a win as well, giving this:

``````function list_vals(
::Type{T},
b::Integer,
n::Integer = binomial(8*sizeof(T), b),
) where {T<:Integer}
𝟙 = one(T)
k = 𝟙 << b - 𝟙
v = Array{T}(undef, n)
v = k
for i = 2:n
z = trailing_zeros(k)
k += 𝟙 << z
x = b - count_ones(k)
k += 𝟙 << x - 𝟙
v[i] = k
end
return v
end
``````
3 Likes

This is the Iterator version of my previous answer:

``````using Combinatorics
inv_count_ones(k,n) =
Iterators.filter(<=(n),
Iterators.map(s->length(s)>0 ? reduce(|,(1 .<<s)) : Int[],
``````

Like the last version, it correctly follows the results in the OP.
(perhaps the last answer was ignored because it feels inefficient, but the `powerset` function means it steps through just the correct values and less than half the values can get filtered by <=(n)).

An algorithm for this is known as Gosper’s Hack (see Hakmem, Hacker’s Delight or Knuth 4A)

``````julia> function GosperHack(N)
S = N & -N
T = N + S
T + (((T ⊻ N) ÷ S) >> 2)
end
GosperHack (generic function with 1 method)

julia> function Generate(N, z::T) where T
Arr = T[]
i = (one(T) << N) - 1
while i ≤ z
push!(Arr, i)
i = GosperHack(i)
end
Arr
end
Generate (generic function with 1 method)
``````

There are two things one can do to it,

1. Making it more convenient, it already almost an iterator.
2. Applying the bitmask speedup (Generating Combinations). The integer division is slower than a popcount on certain architectures.
6 Likes

I timed these, expecting the div operation in `GosperHack` to really tank the performance (integer division is very slow), but it was the same as my `next_val`:

``````julia> @benchmark next_val(0x07)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max):  0.791 ns … 8.208 ns  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     0.875 ns             ┊ GC (median):    0.00%
Time  (mean ± σ):   0.872 ns ± 0.075 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

█
▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▂
0.791 ns       Histogram: frequency by time      0.916 ns <

Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark GosperHack(0x07)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
Range (min … max):  0.791 ns … 25.250 ns  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     0.875 ns              ┊ GC (median):    0.00%
Time  (mean ± σ):   0.877 ns ±  0.247 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

▅           █          ▁                        ▁
▅▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▇ █
0.791 ns     Histogram: log(frequency) by time        1 ns <

Memory estimate: 0 bytes, allocs estimate: 0.
``````

Like exactly the same timings—Gosper’s hack had slightly higher mean and variance but the min and median are identical! If you try it on more different values a bigger difference emerges:

``````julia> @benchmark map(next_val, 0x1:0xff)
BenchmarkTools.Trial: 10000 samples with 839 evaluations.
Range (min … max):  146.106 ns …  19.987 μs  ┊ GC (min … max): 0.00% … 99.13%
Time  (median):     159.863 ns               ┊ GC (median):    0.00%
Time  (mean ± σ):   172.080 ns ± 326.125 ns  ┊ GC (mean ± σ):  6.95% ±  3.97%

▁▄        █▁█▁
▇██▇▂▁▂▁▁▁▆████▆▅▅▆▄▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
146 ns           Histogram: frequency by time          214 ns <

Memory estimate: 304 bytes, allocs estimate: 1.

julia> @benchmark map(GosperHack, 0x1:0xff)
BenchmarkTools.Trial: 10000 samples with 460 evaluations.
Range (min … max):  228.804 ns …  27.354 μs  ┊ GC (min … max): 0.00% … 99.03%
Time  (median):     241.487 ns               ┊ GC (median):    0.00%
Time  (mean ± σ):   251.756 ns ± 374.715 ns  ┊ GC (mean ± σ):  4.13% ±  3.30%

▇█▇▁  ▁▃▆█▇▇▄▃▄▃▂ ▁▁                                          ▂
████▇█████████████████▇▇▇▇▆▇▇▆▆▆▇▆▆▇▇▆▆▆▆▅▆▆▅▆▆▄▅▅▄▄▅▅▅▄▄▅▂▄▅ █
229 ns        Histogram: log(frequency) by time        313 ns <

Memory estimate: 304 bytes, allocs estimate: 1.
``````

I’m guessing LLVM has seen 50 different ways of writing this code and has figured out the best one and replaces them with the one it thinks is best. LLVM is often somewhat scarily good at integer bit math in a loop (perhaps unsurprisingly).

No, I considered that since the timings were so suspiciously similar, but the machine code is very different. The code for `next_val` is more simpler instructions while the code for `GosperHack` is very small but includes that `udiv` and also has and exception branch in the case where the divisor is zero, which can happen when the input is zero.

I get different timings if I fix the benchmark to use the `Ref` interpolation, to prevent the compiler from constant-folding the specific argument you are passing:

``````julia> @btime next_val(0x07);
0.875 ns (0 allocations: 0 bytes)

julia> @btime GosperHack(0x07);
0.875 ns (0 allocations: 0 bytes)

julia> @btime next_val(\$(Ref(0x07))[]);
1.792 ns (0 allocations: 0 bytes)

julia> @btime GosperHack(\$(Ref(0x07))[]);
1.458 ns (0 allocations: 0 bytes)
``````
1 Like

Oh right. Is it time to make a breaking release to `BenchmarkTools` that does this automatically yet? I basically always want it to work like `@code_x` and time how long the function call takes, not how long it takes to evaluate a specific expression with constant propagation and whatnot. Used to not matter so much, but now we’re really quite good at constant prop, so it usually pretty misleading.

1 Like

Wouldn’t help in your example, of course, because you weren’t using `\$` at all.

Doing that could also give us a way to solve the longstanding problem that interpolating types into benchmark expressions de-optimizes them.

``````julia> f(::Type{T}, x) where {T} = T(x) + T(x);

julia> @btime f(Int, 1)
1.082 ns (0 allocations: 0 bytes)
2

julia> @btime f(\$Int, 1)
54.993 ns (0 allocations: 0 bytes)
2
``````

The reason for this is that there’s no way to create an instance of `Tuple{Type{Int}}`, you instead can only construct instances of `Tuple{DataType}`   Replacing `\$x` with `\$(Ref{Core.Typeof(x)}(x))[]` is properly able to communicate the required info though:

``````julia> @btime f(\$(Ref(Int))[], 1)
70.626 ns (0 allocations: 0 bytes)
2

julia> @btime f(\$(Ref{Type{Int}}(Int))[], 1)
1.943 ns (0 allocations: 0 bytes)
2
``````
2 Likes

I mean to go a step further and change `@benchmark f(x)` to mean what `@benchmark f(\$(Ref(x))[x])` means. What it does now is like if `@code_native f(x)` showed you the code for `() -> f(x)` instead of the code of `f` for the type of `x`.

1 Like

The OEIS is perfect to find more about a pattern like this Searching for the first numbers gives the right match as first result:
https://oeis.org/search?q=1%2C1%2C2%2C1%2C2%2C2%2C3%2C1%2C2%2C2%2C3%2C2%2C3%2C3%2C4

I experimented with a strength reduction opportunity:

``````        z = trailing_zeros(k)
k += 𝟙 << z
``````

can also be implemented as

``````        k += k & -k
``````

LLVM generates fewer instructions for it which looks promising. Unfortunately, `@benchmark` doesn’t seem to care to for it. I also played with a slightly different approach. Also slower according to `@benchmark`.

1 Like