Efficiently iterate over partially permuted array

I need to iterate over all multi-set permutations of parts of an array, and get the value of a function which depends on the order of elements in the array. What’s the most efficient way to do this?

Only a subset of the array is permuted, however the results of the function depend on the value of every element in the array.

I’ve come up with:

using Combinatorics, Statistics

my_array = rand('a':'d', 40)

# Example function, my function is about as trivial in terms of runtime
my_func(x, y) = mean(cumsum(x .== y))

function get_all_results(arr)
    # Only permute part of the array, e.g. here all a's and c's in the array
    ps = multiset_permutations(arr[in(['a', 'c']).(arr)], sum(in(['a', 'c']).(arr)))

    result = zeros(length(ps))

    for (i, p) ∈ enumerate(ps)
        # Replace the a's and c's with the permuted a's and c's
        # This is the bit that feels suboptimal?
        arr[in(['a', 'c']).(arr)] = p
        result[i] = my_func(arr, 'c')
    end

    return result
end

But I wonder whether there’s a better way to “shuffle around” the elements in the existing array?

Furthermore, it seems to me that the way I’m doing it currently is not threadsafe/parallelizable, as different threads would try to shuffle the elements in my_array, so if there’s a way around this as well that’d be great!

One and - I promise - only bump for this!

Is this meant to run?

julia> get_all_results(my_array)
ERROR: UndefVarError: multiset_permutations not defined

Ah my apologies, made an MWE in a long running session… multiset_permutations is defined in Combinatorics, and I also used mean from Statistics, have edited the example above

ok! So first(ps) == filter(in(('a','c')), arr), and later p values are permutations of that. Which get copied into the original array before calling my_func(arr, 'c').

You could look at how Combinatorics.MultiSetPermutations works in the hope of doing whatever it does directly on the arr each time, instead of making p? If you want to thread this, you could make a copy of arr per thread?

You can also avoid working out the indices each time via is = findall(in(('a', 'c')), arr), but no idea if it matters much.

1 Like