Minimizing allocations for custom powerset iterator

Just to preface: I realize both Combinatorics.jl and IterTools.jl implement an iterator for powersets. This question is mostly for improving my own Julia skills.

I’m trying to iterate through a powerset as efficiently as possible. I’ve come up with the following scheme:

import Base: show, iterate, eltype, length

struct powersetIter{T<:AbstractVector}
    set::T
    powersetSize::Int
    mask::BitVector
end

powersetIter(set) = powersetIter(set, 2^length(set), falses(length(set)))

iterate(S::powersetIter) = (S.set[S.mask],0)

function iterate(S::powersetIter, state=0)

    if state == S.powersetSize
        return nothing
    end

    for i in eachindex(S.mask)
        if (state & 1<<(i-1))>0
            S.mask[i] = true
        else
            S.mask[i] = false
        end
    end

    return (S.set[S.mask], state+1)
end

length(S::powersetIter) = S.powersetSize
eltype(::Type{powersetIter{T}}) where T = Vector{eltype(T)}

Basically it uses the binary representation of the state variable to update what the next subset should be.

However, it uses a lot more memory than I expected because it allocates a new vector every time S.set[S.mask] is called.

using BenchmarkTools

#just a simple function to loop through powerset
function f(n)
    counter = 0
    for v in powersetIter(1:n)
        counter += 1
    end
    return counter
end

julia> @btime f(20)
  136.901 ms (2097154 allocations: 172.00 MiB)
1048576 #2^20

Is there a way to tell Julia to reuse a pre-allocated space (total size length(S.set) ) for all of the iterated subsets?

I compared your iterator with the others you mentioned

using BenchmarkTools

#just a simple function to loop through powerset
function f(n)
    counter = 0
    for _ in powersetIter(1:n)
        counter += 1
    end
    return counter
end

@btime f(20)

using IterTools

function f(n)
    counter = 0
    for _ in subsets(1:n)
        counter += 1
    end
    return counter
end

@btime f(20)

using Combinatorics

function f(n)
    counter = 0
    for _ in powerset(1:n)
        counter += 1
    end
    return counter
end

@btime f(20)

yielding

  134.698 ms (2097154 allocations: 172.00 MiB) # OP
  182.991 ms (4194305 allocations: 332.00 MiB) # IterTools.subsets
  87.930 ms (2097176 allocations: 172.00 MiB) # Combinatorics.powerset

So it looks to me you are near optimal with regard to the standard iterator protocol. OTOH maybe you can use a custom ‘inplace’ iterator protocol to reduce allocations?

Edit: I tried it and indeed one can reduce allocations that way, but performance is slightly worse.

I’ve also played around with an “in-place” version, but the reallocation/resizing of vectors always costs one allocation per iteration. I think the remaining speed can be gained by not actually creating a slice or a view of a vector, but instead returning another custom iterator just for returning the “set” of any given iteration.

1 Like

For reference, here is what I have

import Base: show, iterate, eltype, length

struct powersetIter{T}
    set::T
    powersetSize::Int
    mask::BitVector
end

powersetIter(set) = powersetIter(set, 2^length(set), falses(length(set)))

iterate(S::powersetIter) = (S.set[S.mask],0)

function iterate(S::powersetIter, state=0)

    if state == S.powersetSize
        return nothing
    end

    for i in eachindex(S.mask)
        if (state & 1<<(i-1))>0
            S.mask[i] = true
        else
            S.mask[i] = false
        end
    end

    return (S.set[S.mask], state+1)
end

function inplaceiterate(S::powersetIter, item::Vector{T}, state=0) where {T}

    if state == S.powersetSize
        return nothing
    end

    empty!(item)
    for i in 1:length(S.set)
        (state & 1 << (i-1)) > 0 && push!(item, S.set[i])
    end

    return state+1
end

length(S::powersetIter) = S.powersetSize
eltype(::Type{powersetIter{T}}) where T = Vector{eltype(T)}

using BenchmarkTools

#just a simple function to loop through powerset
function f(n)
    counter = 0
    iter = powersetIter(1:n)
    next = iterate(iter)
    while next !== nothing
        (item, state) = next
        counter += 1
        next = iterate(iter, state)
    end    
    return counter
end

@btime f(20)

function f(n)
    counter = 0
    iter = powersetIter(1:n)
    item = Vector{Int}()
    state = inplaceiterate(iter, item)
    while state !== nothing
        counter += 1
        state = inplaceiterate(iter, item, state)
    end    
    return counter
end

@btime f(20)

yielding

  127.257 ms (2097154 allocations: 172.00 MiB) # iterate
  138.315 ms (5 allocations: 576 bytes) #inplaceiterate

Interesting idea.

Edit: Interesting, if we increase memory pressure things start to change. For f(26) I see

  11.236 s (134217730 allocations: 12.25 GiB) # iterate
  9.976 s (5 allocations: 576 bytes) #inplaceiterate

This is some really great input. Based on @goerch benchmarks its possible that an in-place iterator for a powerset is only beneficial for larger sets. I’m guessing because you eventually hit garbage collection.

It looks like Combinatorics.jl uses a different approach to iterate through the powerset, basically they use the fact that iterating through every combination set gives the powerset:

# from Combinatorics.powerset
itrs = [combinations(a, k) for k = min:max]

Also worth noting that IterTools.jl has a method for statically sized subsets:

using IterTools, Combinatorics
using BenchmarkTools

function f_iter(set)
    counter=0
    for _ in IterTools.subsets(set, Val{2}())
        counter +=1
    end

    return counter
end

function f_combin(set)
    counter=0
    for _ in Combinatorics.combinations(set, 2)
        counter +=1
    end

    return counter
end

@btime f_iter(1:20) #219.285 ns (0 allocations: 0 bytes)
@btime f_combin(1:20) #8.667 μs (381 allocations: 20.86 KiB)

Ideally I feel like you could leverage these two ideas, but in practice it doesn’t seem to help that much.

function f(n)
    counter = 0
    for iters in (subsets(1:n, Val{i}()) for i in 0:n)
        for sets in iters
            counter += 1
        end
    end
    return counter
end

@btime f(26) # 7.265 s (134217780 allocations: 21.26 GiB)

Mostly because the type of sets changes with each iteration.

1 Like

One remark (and/or question): I’m not sure if we hit GC with this problem size yet. Wouldn’t @btime show something like gc time then or is it different from @time regarding GC?

Edit: it doesn’t look like @btime is reporting gc time (just tested with a GC forcing example). Ouch.

2 Likes

Hm, I’m not sure. It looks like by default @btime does not display GC:

@btime f(26)
  10.134 s (134217730 allocations: 12.25 GiB)

@time f(26)
 10.296567 seconds (134.22 M allocations: 12.250 GiB, 7.16% gc time)

GC was actually one of my original concerns for the problem this question came from because I was running things in parallel.

1 Like

Hm, sounds like it is time for me to switch to @benchmark then…

Thanks!

1 Like