I’ve been using Julia to complete advent of code this year. Today’s problem had a part which naturally lent itself to storing pairs of tuples of integers in a set. My program was running far more slowly that I expected to, so I implemented the same logic in Python to see if that faired any better. My Python code completed in less than 5 seconds. I came back to my Julia code and used my own hashing function to instead store a pair of integers and this time my program complete in under 4 seconds. The only change in my program is the one shown below.
if Tuple([Tuple(p1),Tuple(p2)]) in seen
return score(p1),1
end
push!(seen,Tuple([Tuple(p1),Tuple(p2)]))
Which I changed to:
if Tuple([score(p1),score(p2)]) in seen
return score(p1),1
end
push!(seen,Tuple([score(p1),score(p2)]))
This small change made my runtime go from over a minute to less than 4 seconds. Curious, I timed the code below and it took 15 seconds to run while equivalent python code completed in less than 3 seconds.
for _ in 1:1000000
i = rand(0:50)
hash(Tuple([Tuple(1:i),Tuple(i+1:50)]))
end
Does there exist a consistently performant way to store tuples of tuples in a set?
Are you sure it is the hashing that is slow or could it be the fact that you are creating a bunch of tuples of unknown length (type unstable) and an array holding these two tuples in every iteration?
The problem with your code is not the call to hash, but the type instabilities from creating tuples with non-inferable length. Just replacing Tuple([Tuple(p1), Tuple(p2)]) with (Tuple(p1), Tuple(p2)) already makes this a lot faster:
julia> @time for _ in 1:1000000
i = rand(0:50)
hash((Tuple(1:i), Tuple(i+1:50)))
end
2.541987 seconds (5.96 M allocations: 1.371 GiB, 2.92% gc time)
julia> @time for _ in 1:1000000
i = rand(0:50)
hash(Tuple([Tuple(1:i),Tuple(i+1:50)]))
end
11.292196 seconds (13.20 M allocations: 2.993 GiB, 1.20% gc time)
Tuple(1:rand(0:50)) is still not type stable though, so I would highly recommend storing the ranges as UnitRanges instead of Tuples inside your set.
I tried something like that as well but still ran into the same problems. I should note that p1,p2 are arrays and so I have to convert them into tuples. The variations I tried are
You don’t though. You can have a Set{NTuple{2, Vector{Int}}} and just put the two arrays in there (like what was said in Hashing tuples of tuples is seemingly very slow - #2 by Skoffer). That’s e.g. what I did for advent of code. Just don’t mutate them after you put them into the set.