# Hashing tuples of tuples is seemingly very slow

I’ve been using Julia to complete advent of code this year. Today’s problem had a part which naturally lent itself to storing pairs of tuples of integers in a set. My program was running far more slowly that I expected to, so I implemented the same logic in Python to see if that faired any better. My Python code completed in less than 5 seconds. I came back to my Julia code and used my own hashing function to instead store a pair of integers and this time my program complete in under 4 seconds. The only change in my program is the one shown below.

``````    if Tuple([Tuple(p1),Tuple(p2)]) in seen
return score(p1),1
end
push!(seen,Tuple([Tuple(p1),Tuple(p2)]))
``````

Which I changed to:

``````    if Tuple([score(p1),score(p2)]) in seen
return score(p1),1
end
push!(seen,Tuple([score(p1),score(p2)]))
``````

This small change made my runtime go from over a minute to less than 4 seconds. Curious, I timed the code below and it took 15 seconds to run while equivalent python code completed in less than 3 seconds.

``````for _ in 1:1000000
i = rand(0:50)
hash(Tuple([Tuple(1:i),Tuple(i+1:50)]))
end
``````

Does there exist a consistently performant way to store tuples of tuples in a set?

Not an answer to your question, but this usage of tuples looks overly complicated. Why not just

`````` if (p1, p2) in seen
return score(p1),1
end
push!(seen, (p1, p2))
``````

It is possible, that `Tuple([Tuple(p1), Tuple(p2)])` construction creates intermediate arrays.

2 Likes

Are you sure it is the hashing that is slow or could it be the fact that you are creating a bunch of tuples of unknown length (type unstable) and an array holding these two tuples in every iteration?

2 Likes

The problem with your code is not the call to `hash`, but the type instabilities from creating tuples with non-inferable length. Just replacing `Tuple([Tuple(p1), Tuple(p2)])` with `(Tuple(p1), Tuple(p2))` already makes this a lot faster:

``````julia> @time for _ in 1:1000000
i = rand(0:50)
hash((Tuple(1:i), Tuple(i+1:50)))
end
2.541987 seconds (5.96 M allocations: 1.371 GiB, 2.92% gc time)

julia> @time for _ in 1:1000000
i = rand(0:50)
hash(Tuple([Tuple(1:i),Tuple(i+1:50)]))
end
11.292196 seconds (13.20 M allocations: 2.993 GiB, 1.20% gc time)
``````

`Tuple(1:rand(0:50))` is still not type stable though, so I would highly recommend storing the ranges as `UnitRange`s instead of `Tuple`s inside your set.

I tried something like that as well but still ran into the same problems. I should note that p1,p2 are arrays and so I have to convert them into tuples. The variations I tried are

``````    tuple(Tuple(p1),Tuple(p2))
tuple(tuple(p1...),tuple(p2...))
(tuple(p1...),tuple(p2...))
(Tuple(p1),Tuple(p2))
Tuple([Tuple(p1),Tuple(p2)])
Tuple([tuple(p1...),tuple(p2...)])
``````

But all were slow.

You don’t though. You can have a `Set{NTuple{2, Vector{Int}}}` and just put the two arrays in there (like what was said in Hashing tuples of tuples is seemingly very slow). That’s e.g. what I did for advent of code. Just don’t mutate them after you put them into the set.

2 Likes

Sorry about my confusion over what was going slow. I changed the code to what’s shown below and it now runs in ~2.8 seconds. Thanks for your help.

``````    if (p1,p2) in prev1
return score(p1),1
end
push!(prev1,(copy(p1),copy(p2)))``````