Combinatorics.permutations creating a lot of allocations

This is my first post here, so I hope I can explain the problem I’ve come across as I’m writing code for my master thesis in Maths. Compared to others here I’m definitely no expert, but I trusted code from Packages to give good performance, so I’m not sure what I do wrong here or if there is in fact badly performing code in Combinatorics.permutations
I want to iterate over all permutations that keep a small vector v (length < 6) constant so if v = [2,1,1,1] it would be any permutation of only the last three entries. This function is called many times (for the same vector v) within another loop, so allocations are the most important thing for me.
This is a straight forward testperm1() equivalent to my first implementation and an dumber testperm2() that somehow performs better

using Combinatorics
using LinearAlgebra
using BenchmarkTools

v = [2,1,1,1];
vPerm = similar(v);
n = length(v);
permArray = collect(permutations(1:n));
permMat = zeros(Int, n, n);

function testPerm1(v, n, vPerm, permMat)
    for perm in permutations(1:n)
        updatePermMatFromPermVec!(permMat, perm)
        mul!(vPerm, permMat, v)
        if vPerm == v
        end
    end
end

function testPerm2(v, n, vPerm, permArray, permMat)
    for i in 1:factorial(n)
        updatePermMatFromPermVec!(permMat, permArray[i])
        mul!(vPerm, permMat, v)
        if vPerm == v
        end
    end
end

function updatePermMatFromPermVec!(permMat, permVec)
    for j in eachindex(permVec)
        for i in eachindex(permVec)
            if permVec[i] == j
                permMat[i,j] = 1;
            end
        end
    end
end

Again I’m not really asking for other optimizations, updatePermMatFromPermVec! might be far from optimal, my concern is that the allocations are

@btime testPerm1(v, n, vPerm, permMat)
  3.450 μs (98 allocations: 4.59 KiB)

while

@btime testPerm2(v, n, vPerm, permArray, permMat)
  1.350 μs (0 allocations: 0 bytes)

and I only have to allocate permArray once on top before hand, which gives me 1.978 μs (102 allocations: 4.91 KiB), so only looping over testperm1 a second time makes it a lot worse than collecting the whole array.
It seems to allocate every perm that is iterated over, I thought using the iterator should be a lot better for performance.
I read about @inline in another topic (I got an message saying I can’t link to it in my post), but I don’t really understand if that is also the problem here or what I can do to improve the code.
Is some dumb mistake of mine the problem or does Combinatorics really have some bad code?

sounds like they need a new release

Hi, and welcome to the Julia community!

This is indeed why you get allocations (see * below). A faster implementation of the iterator, like the one jling refers to, will have the same problem.

permutations_b(a, t::Integer=length(a)) =
    Iterators.map(
        indices -> [a[i] for i in indices],
        multiset_permutations(eachindex(a), t))

function testPerm1_b(v, n, vPerm, permMat)
    for perm in permutations_b(1:n)
        updatePermMatFromPermVec!(permMat, perm)
        mul!(vPerm, permMat, v)
        if vPerm == v
       end
    end
end

@btime testPerm1($v, $n, $vPerm, $permMat)              # 9.400 μs (74 allocations: 3.09 KiB)
@btime testPerm1_b($v, $n, $vPerm, $permMat)            # 3.513 μs (174 allocations: 8.06 KiB)
@btime testPerm2($v, $n, $vPerm, $permArray, $permMat)  # 932.143 ns (0 allocations: 0 bytes)

In your testPerm2 approach, you just move the allocations (of each permutation individually, and of the Vector{Vector{Int}} permArray storing these) outside of the benchmark.


There exists an inplace function to get the n-th permutation: nthperm!, but it consumes the input array. Nevertheless, if you keep resetting this input, you can use this function to avoid most allocations:

function testPerm3(v, n, vPerm, permMat)
    perm = similar(v)
    for i in 1:factorial(n)
        perm .= 1:n
        nthperm!(perm, i)  # Now perm contains the i-th permutation of 1:n
        updatePermMatFromPermVec!(permMat, perm)
        mul!(vPerm, permMat, v)
        if vPerm == v
        end
    end
end

@btime testPerm3($v, $n, $vPerm, $permMat)  #  1.510 μs (2 allocations: 96 bytes)

*: Concretely, in the file permutations.jl we find

function Base.iterate(p::Permutations, state::Vector{Int}=fill(firstindex(p.data), p.length))
    next_permutation!(state, firstindex(p.data), lastindex(p.data))
    if first(state) > lastindex(p.data)
        return nothing
    end
    [p.data[i] for i in state], state
end

Here the [p.data[i] for i in state] causes the allocations. But in our case with p.data == 1:n, there is no difference between the permutation and the state, so you could use

function testPerm4(v, n, vPerm, permMat)
    perm = collect(1:n)  # the first permutation of 1:n
    while first(perm) <= n
        updatePermMatFromPermVec!(permMat, perm)
        mul!(vPerm, permMat, v)
        if vPerm == v
        end
        Combinatorics.next_permutation!(perm, 1, n)
    end
end

@btime testPerm4($v, $n, $vPerm, $permMat)  # 7.175 μs (2 allocations: 96 bytes)

which does indeed get rid of most allocations, though is also quite a lot slower than testPerm3, for some reason. Additionally, testPerm4 relies on an unexported method, a practice which is safer to avoid.

1 Like

Wow, thank you a lot for your reply.
I understood that between my testperm1 and testperm2 I just shifted the allocations, but the function is called a lot of times (probably millions of times), and if its called multiple times its a lot better to collect the array once and then pass it as an argument (I still believe this is true).
The 2 allocations in your testPerm3 come from perm = similar(v) I suppose, I will preallocate this outside, but your function is perfect thank you!
I didn’t really understand what an “unexported method” is and what’s your thought behind the fourth function but I will look this up, you really don’t need to explain it, you have already helped a lot.

1 Like

My packages SmallCollections.jl and SmallCombinatorics.jl (recently split off from SmallCollections.jl) may be helpful. With the code

using SmallCollections, SmallCombinatorics, Chairmarks

function testPerm3(v)
    s = 0
    for perm in SmallCombinatorics.permutations(length(v))
        if @inbounds v[perm] == v
            s += 1
        end
    end
    s
end

I get

julia> @b testPerm2($v, $n, $vPerm, $permArray, $permMat)
1.030 μs

julia> w = SmallVector{8,Int8}([2, 1, 1, 1]);

julia> @b testPerm3($w)
297.680 ns (2 allocs: 64 bytes)

(The two allocations shouldn’t be there. I have to investigate this.)

The branch SmallCollections#shuffle (soon to be merged into master) has fast vector indexing for SmallVector for Intel/AMD processors. With this branch I get

julia> @b testPerm3($w)
79.720 ns
2 Likes

Thanks for your answer!
I cleared the bottleneck with testperm3 of @eldee’s implementation, but it is nonetheless very helpful to know that there is a better performing combinatorics package.
The last remaining allocation problem in my code has to do with Combinatorics.multiset_permutations, as far as I could see you haven’t implemented an alternative for that, or did I miss that?
I will create another topic for the remaining part (as I think that’s how topics should be separated in this forum, they are both part of the same program but are independent of each other).

That’s correct. It’s still a young package …

1 Like