I have a an array of strings, of length ~2,000. I would like to produce all permutations of lengths 1 to n
(n
is always going to be less than or equal to 2).
The problem: this is taking much longer than I think it should. Am I missing an obvious optimization?
My current attempt is
import Combinatorics
"Return all permutations of sizes 1:`max_length`"
function power_permutations(iterable, max_length)
Iterators.flatten(Combinatorics.permutations(iterable, r) for r in 1:max_length)
end
and simply running through it we get
using BenchmarkTools
@benchmark foreach(identity, power_permutations(collect(1:2_000), 2))
BenchmarkTools.Trial:
memory estimate: 60.80 GiB
allocs estimate: 20000015
--------------
minimum time: 17.592 s (36.62% GC)
median time: 17.592 s (36.62% GC)
mean time: 17.592 s (36.62% GC)
maximum time: 17.592 s (36.62% GC)
--------------
samples: 1
evals/sample: 1
Woah, that is a lot of memory.
Let’s compare this to a similar version in Python.
import itertools
def power_permutations(iterable, max_length):
"Return all permutations of sizes 1:`max_length`"
for r in range(max_length + 1):
yield from itertools.permutations(iterable, r)
%timeit len(list(power_permutations(list(range(2000)), 2)))
484 ms ± 8.73 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
hmmm, much faster than the Julia version.
Let’s try again with arrays of strings. In Julia:
@benchmark foreach(identity, power_permutations(collect(Iterators.repeated("hello",2_000)), 2))
BenchmarkTools.Trial:
memory estimate: 60.80 GiB
allocs estimate: 20000015
--------------
minimum time: 17.404 s (36.79% GC)
median time: 17.404 s (36.79% GC)
mean time: 17.404 s (36.79% GC)
maximum time: 17.404 s (36.79% GC)
--------------
samples: 1
evals/sample: 1
And in Python:
%timeit len(list(power_permutations(["hello"]*2000, 2)))
473 ms ± 3.8 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
We see more or less the same performance.
Am I missing something obvious here? Is this a shortcoming of the permutations()
function in JuliaMath/Combinatorics?