True power set of a set is slower than power array of array

I’m trying to create a power set function with only set operations, but it turns out to be much slower than the array version. I wonder why this is the case, and how to make the Set version as fast as the array version.

Thank you ! :slight_smile:

function powerset(x::AbstractArray)   
    result = [[]]
    for elem in x, j in eachindex(result)
        push!(result, [result[j] ; elem])
    end
    result
end

@time powerset([1:17…]);

function powerset(x::AbstractSet)   
    result = Set([Set()])
    for  elem in x, j in result
        union!(result, Set([union(j,elem)]))
    end
    result
end

@time powerset(Set(1:17));

You can profile it to see what’s slow. Before looking at the code, I would guess that set union is slower than array concatenation because set union also requires checking for duplicates.

2 Likes

The expensive part is not actually checking for duplicates, but because Set is implemented as a hashmap, each insertion into a set needs to calculate the hash of the item first. Hashing a set is currently O(n), since the overall hash depends on each element of the set. OTOH, appending to an array is amortized to only be O(log(n)). I wonder whether we can use a similar strategy for hashing sets as we do for arrays, where for large arrays we only actually sample O(log(n)) items. That might be worth opening an issue actually.

2 Likes

Wow, thank you for letting me know profile. Looking at the output, there certainly seems to be alot more things going on in the Set version.

A total of 11 snapshots for powerset([1:15...]) while 471 for powerset(Set(1:15))