Computing the preimage of count_ones

oxinabox · October 11, 2023, 9:55am

count_ones(x) counts how many 1s are in the binary representation of the integer x

I am looking for a function that computes the preimage of that.
Or informally the inverse (but it’s not actually invertable).
That is to say for for any n I want a function that returns an iterator of values y such that for all those values count_ones(y) == n (and for no other integers).
Up to some maximum z.
The native implementation is:

inv_count_ones(n, z) = filter(y->count_ones(y)==n, 1:z)

And it does indeed have

inv_count_ones(0, 8) == []
inv_count_ones(1, 8) == [1,2, 4, 8]
inv_count_ones(2, 8) == [3, 5, 6]
inv_count_ones(3, 8) == [7,]
inv_count_ones(4, 8) == []

But is there a better way?

Dan · October 11, 2023, 11:17am

Here’s one way:

using Combinatorics
inv_count_ones(k,n) = 
  filter(<=(n), [reduce(|,(1 .<< s)) 
    for s in powerset(0:(8*sizeof(n)-leading_zeros(n))-1, k,k) 
    if length(s)>0])

The problem is pretty standard in combinatorics and bit-manipulation. The key feature is to leverage the extra parameters of the powerset function.

julia> [inv_count_ones(i, 8) for i in 0:4] == 
       [[], [1,2,4,8], [3,5,6], [7], []]
true

BTW this can be transformed into an Iterator for these sets, which can be more efficient in memory (and perhaps time).

Benny · October 11, 2023, 11:39am

There’s definitely a pattern in the positive Ints, if you can split the implementation by types and ranges.

julia> for i in 1:32
        ci = count_ones(i)
        if ci == 1 println("\n",ci) else print(" ", ci) end
       end

1

1
 2
1
 2 2 3
1
 2 2 3 2 3 3 4
1
 2 2 3 2 3 3 4 2 3 3 4 3 4 4 5
1

You can tell n=1 occurs at powers of 2, and for a given n you can start the search at the minimum integer with n 1s, (2^n-1):z.
There is a pattern between the powers of 2, but not sure how to exploit that for n>1

foobar_lv2 · October 11, 2023, 3:01pm

This is a fun exercise in combinatorics. I vaguely remember that this is pretty well-known and standard, ~~but I don’t recall the name, so I’ll instead re-derive the construction here.~~

edit: @Syx_Pek provided the name down-thread, it’s called combinatorial number system.

Originally I described a re-derivation with code, but read wikipedia instead, it’s clearer than what I wrote.

click here to expand my pre-edit off-the-cuff derivation

First of all, suppose you had your array inv_co(k) = filter(y->count_ones(y)==k, 1: (-1 >>> 1)), and you’re given a specific N, such that count_ones(N)=k. What is the index of N in this list inv_co(k)?

Well, it is one plus the number of smaller entries. This is easy to compute:

struct BitsIter
u::Int64
end
Base.iterate(b::BitsIter) = iterate(b, b.u)
Base.iterate(b::BitsIter, u) = (u == 0 ? nothing : (trailing_zeros(u), Base._blsr(u)))

function index(N)
  res = 1
  for (nbits, bitpos) in enumerate(BitsIter(N))
    #there are nbits many bits we need to distribute over bitpos many places
    res += binomial(bitpos, nbits)
  end
  res
end

We can reality-check that:

julia> inv_count_ones(n, z) = filter(y->count_ones(y)==n, 1:z)
inv_count_ones (generic function with 1 method)

julia> length(inv_count_ones(5, 1000))
252

julia> map(index, inv_count_ones(5, 1000)) == 1:length(inv_count_ones(5, 1000))
true

OK, now we need to invert the index function (you want random access to your iterator / lazy collection of preimages, right?). That is: We are given idx and K, and want z such that count_ones(z) == K and index(z) == idx.

I think the construction is best explained by giving a (slowish recursive) implementation:

function inv_index0(k, idx)
       k == 1 && return 1<<idx
       pos = findfirst(p -> binomial(p, k) > idx, 1:64)
       rem = idx - binomial(pos-1, k)
       @show k, pos, idx
       return (1<<(pos-1)) | inv_index0(k-1, rem)
end
inv_index(k, idx) = inv_index0(k, idx-1)

We can reality check that:

julia> for i = 1:1000
       z = rand(1:(1<<35))
       k = count_ones(z)
       if z != inv_index(k, index(z))
       @show z, k, i
       @assert false
       end
       end

Remaining parts of the exercise: Make a loop out of the tail-recursion, deal with integer overflows, make it fast (lookup table), optionally make a fast iterator, optionally make a fast container.

StefanKarpinski · October 11, 2023, 6:01pm

This function takes a BitInteger and gives the next value of the same type that has the same number of bits:

function next_val(k::T) where {T<:Base.BitInteger}
    𝟙 = one(T)
    z = trailing_zeros(k)
    k += 𝟙 << z
    d = trailing_zeros(k) - z - 1
    k += 𝟙 << d - 𝟙
end

Basically, the logic is that you shift the last bit left by adding a bit at that position; if there’s a cascade of carrying because there’s more than one bit on in a row, then you’ll end up with fewer bits in the resulting value and you need to replace those with a number of least significant bits. Here’s a function collects them into a vector and returns it:

function list_vals(::Type{T}, b::Integer) where {T<:Base.BitInteger}
    vals = T[]
    𝟙 = one(T)
    k = 𝟙 << b - 𝟙
    while count_ones(k) == b
        push!(vals, k)
        z = trailing_zeros(k)
        k += 𝟙 << z
        d = trailing_zeros(k) - z - 1
        k += 𝟙 << d - 𝟙
    end
    return vals
end
list_vals(b::T) where {T<:Base.BitInteger} = list_vals(T, b)

With this definition we get:

julia> map(bitstring, list_vals(UInt8(5)))
56-element Vector{String}:
 "00011111"
 "00101111"
 "00110111"
 "00111011"
 ⋮
 "11110001"
 "11110010"
 "11110100"
 "11111000"

StefanKarpinski · October 11, 2023, 6:21pm

Ah, I just figured out a simpler way to do next_val that lets you cycle around back to the beginning once you overflow and also lets this work for arbitrary integer types:

function next_val(k::T) where {T<:Integer}
    𝟙 = one(T)
    b = count_ones(k)
    z = trailing_zeros(k)
    k += 𝟙 << z
    x = b - count_ones(k)
    k += 𝟙 << x - 𝟙
end

I think this also makes it more obvious that what you’re doing is repopulating the number of bits that you lots. There’s an extra count_ones(k) call at the top, but that’s going to be a loop constant in the loop version; however, it means that the loop condition can’t be based on counting ones anymore, so it’s a bit of a wash. However, you can also precompute the correct array size using the binomal function, which is probably a win as well, giving this:

function list_vals(
    ::Type{T},
    b::Integer,
    n::Integer = binomial(8*sizeof(T), b),
) where {T<:Integer}
    𝟙 = one(T)
    k = 𝟙 << b - 𝟙
    v = Array{T}(undef, n)
    v[1] = k
    for i = 2:n
        z = trailing_zeros(k)
        k += 𝟙 << z
        x = b - count_ones(k)
        k += 𝟙 << x - 𝟙
        v[i] = k
    end
    return v
end

Dan · October 11, 2023, 6:23pm

This is the Iterator version of my previous answer:

using Combinatorics
inv_count_ones(k,n) = 
  Iterators.filter(<=(n), 
    Iterators.map(s->length(s)>0 ? reduce(|,(1 .<<s)) : Int[],
      powerset(0:(8*sizeof(n)-leading_zeros(n))-1, k,k)))

Like the last version, it correctly follows the results in the OP.
(perhaps the last answer was ignored because it feels inefficient, but the powerset function means it steps through just the correct values and less than half the values can get filtered by <=(n)).

Syx_Pek · October 11, 2023, 6:47pm

An algorithm for this is known as Gosper’s Hack (see Hakmem, Hacker’s Delight or Knuth 4A)

julia> function GosperHack(N)
           S = N & -N
           T = N + S
           T + (((T ⊻ N) ÷ S) >> 2)
       end
GosperHack (generic function with 1 method)

julia> function Generate(N, z::T) where T
           Arr = T[]
           i = (one(T) << N) - 1
           while i ≤ z
               push!(Arr, i)
               i = GosperHack(i)
           end
           Arr
       end
Generate (generic function with 1 method)

There are two things one can do to it,

Making it more convenient, it already almost an iterator.
Applying the bitmask speedup (Generating Combinations). The integer division is slower than a popcount on certain architectures.

StefanKarpinski · October 11, 2023, 7:40pm

I timed these, expecting the div operation in GosperHack to really tank the performance (integer division is very slow), but it was the same as my next_val:

julia> @benchmark next_val(0x07)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  0.791 ns … 8.208 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     0.875 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   0.872 ns ± 0.075 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                                        █
  ▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▄▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▂
  0.791 ns       Histogram: frequency by time      0.916 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

julia> @benchmark GosperHack(0x07)
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  0.791 ns … 25.250 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     0.875 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   0.877 ns ±  0.247 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

             ▅           █          ▁                        ▁
  ▅▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▇ █
  0.791 ns     Histogram: log(frequency) by time        1 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Like exactly the same timings—Gosper’s hack had slightly higher mean and variance but the min and median are identical! If you try it on more different values a bigger difference emerges:

julia> @benchmark map(next_val, 0x1:0xff)
BenchmarkTools.Trial: 10000 samples with 839 evaluations.
 Range (min … max):  146.106 ns …  19.987 μs  ┊ GC (min … max): 0.00% … 99.13%
 Time  (median):     159.863 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   172.080 ns ± 326.125 ns  ┊ GC (mean ± σ):  6.95% ±  3.97%

   ▁▄        █▁█▁
  ▇██▇▂▁▂▁▁▁▆████▆▅▅▆▄▂▂▂▁▁▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  146 ns           Histogram: frequency by time          214 ns <

 Memory estimate: 304 bytes, allocs estimate: 1.

julia> @benchmark map(GosperHack, 0x1:0xff)
BenchmarkTools.Trial: 10000 samples with 460 evaluations.
 Range (min … max):  228.804 ns …  27.354 μs  ┊ GC (min … max): 0.00% … 99.03%
 Time  (median):     241.487 ns               ┊ GC (median):    0.00%
 Time  (mean ± σ):   251.756 ns ± 374.715 ns  ┊ GC (mean ± σ):  4.13% ±  3.30%

  ▇█▇▁  ▁▃▆█▇▇▄▃▄▃▂ ▁▁                                          ▂
  ████▇█████████████████▇▇▇▇▆▇▇▆▆▆▇▆▆▇▇▆▆▆▆▅▆▆▅▆▆▄▅▅▄▄▅▅▅▄▄▅▂▄▅ █
  229 ns        Histogram: log(frequency) by time        313 ns <

 Memory estimate: 304 bytes, allocs estimate: 1.

Oscar_Smith · October 11, 2023, 7:44pm

I’m guessing LLVM has seen 50 different ways of writing this code and has figured out the best one and replaces them with the one it thinks is best. LLVM is often somewhat scarily good at integer bit math in a loop (perhaps unsurprisingly).

StefanKarpinski · October 11, 2023, 7:47pm

No, I considered that since the timings were so suspiciously similar, but the machine code is very different. The code for next_val is more simpler instructions while the code for GosperHack is very small but includes that udiv and also has and exception branch in the case where the divisor is zero, which can happen when the input is zero.

stevengj · October 11, 2023, 7:52pm

I get different timings if I fix the benchmark to use the Ref interpolation, to prevent the compiler from constant-folding the specific argument you are passing:

julia> @btime next_val(0x07);
  0.875 ns (0 allocations: 0 bytes)

julia> @btime GosperHack(0x07);
  0.875 ns (0 allocations: 0 bytes)

julia> @btime next_val($(Ref(0x07))[]);
  1.792 ns (0 allocations: 0 bytes)

julia> @btime GosperHack($(Ref(0x07))[]);
  1.458 ns (0 allocations: 0 bytes)

StefanKarpinski · October 11, 2023, 7:55pm

Oh right. Is it time to make a breaking release to BenchmarkTools that does this automatically yet? I basically always want it to work like @code_x and time how long the function call takes, not how long it takes to evaluate a specific expression with constant propagation and whatnot. Used to not matter so much, but now we’re really quite good at constant prop, so it usually pretty misleading.

stevengj · October 11, 2023, 7:56pm

github.com/JuliaCI/BenchmarkTools.jl

$x should become $(Ref(x))[]?

opened 03:43PM - 14 Apr 21 UTC

stevengj

enhancement

It's a bit annoying when you have to use this convoluted `Ref` trick to prevent …the compiler from constant-folding the whole calculation (ala #130): ```jl julia> @btime exp(1); 0.052 ns (0 allocations: 0 bytes) julia> @btime exp($(Ref(1))[]); 13.785 ns (0 allocations: 0 bytes) ``` It would be nicer if we could just use `@btime exp($1)` and the `$` did the trick automatically. (Another case [just came up on discourse](https://discourse.julialang.org/t/vectors-vs-tuples-and-multiple-dispatch/59281/3?u=stevengj). cc @stillyslalom)

Wouldn’t help in your example, of course, because you weren’t using $ at all.

Mason · October 11, 2023, 8:06pm

Doing that could also give us a way to solve the longstanding problem that interpolating types into benchmark expressions de-optimizes them.

julia> f(::Type{T}, x) where {T} = T(x) + T(x); 

julia> @btime f(Int, 1)
  1.082 ns (0 allocations: 0 bytes)
2

julia> @btime f($Int, 1)
  54.993 ns (0 allocations: 0 bytes)
2

The reason for this is that there’s no way to create an instance of Tuple{Type{Int}}, you instead can only construct instances of Tuple{DataType}

Replacing $x with $(Ref{Core.Typeof(x)}(x))[] is properly able to communicate the required info though:

julia> @btime f($(Ref(Int))[], 1)
  70.626 ns (0 allocations: 0 bytes)
2

julia> @btime f($(Ref{Type{Int}}(Int))[], 1)
  1.943 ns (0 allocations: 0 bytes)
2

StefanKarpinski · October 11, 2023, 9:48pm

I mean to go a step further and change @benchmark f(x) to mean what @benchmark f($(Ref(x))[x]) means. What it does now is like if @code_native f(x) showed you the code for () -> f(x) instead of the code of f for the type of x.

sijo · October 12, 2023, 6:59am

The OEIS is perfect to find more about a pattern like this Searching for the first numbers gives the right match as first result:
https://oeis.org/search?q=1%2C1%2C2%2C1%2C2%2C2%2C3%2C1%2C2%2C2%2C3%2C2%2C3%2C3%2C4

jburgy · October 15, 2023, 2:23am

I experimented with a strength reduction opportunity:

        z = trailing_zeros(k)
        k += 𝟙 << z

can also be implemented as

        k += k & -k

LLVM generates fewer instructions for it which looks promising. Unfortunately, @benchmark doesn’t seem to care to for it. I also played with a slightly different approach. Also slower according to @benchmark.

Topic		Replies	Views
Shifting the bit in the array General Usage	29	1020	June 22, 2023
Faster way to find all bit arrays of weight n Performance	14	460	May 2, 2024
Best way to count element pairs (x,y) satisfying a condition? Performance question	19	577	October 22, 2023
For loop performance vs "functional" performance Performance advent-of-code	17	520	January 4, 2025
Fun One Liners General Usage	142	13348	October 16, 2022

Computing the preimage of count_ones

Related topics