Hashing tuples of tuples is seemingly very slow

rdl · December 22, 2020, 10:43am

I’ve been using Julia to complete advent of code this year. Today’s problem had a part which naturally lent itself to storing pairs of tuples of integers in a set. My program was running far more slowly that I expected to, so I implemented the same logic in Python to see if that faired any better. My Python code completed in less than 5 seconds. I came back to my Julia code and used my own hashing function to instead store a pair of integers and this time my program complete in under 4 seconds. The only change in my program is the one shown below.

    if Tuple([Tuple(p1),Tuple(p2)]) in seen
        return score(p1),1
    end        
    push!(seen,Tuple([Tuple(p1),Tuple(p2)]))

Which I changed to:

    if Tuple([score(p1),score(p2)]) in seen
        return score(p1),1
    end        
    push!(seen,Tuple([score(p1),score(p2)]))

This small change made my runtime go from over a minute to less than 4 seconds. Curious, I timed the code below and it took 15 seconds to run while equivalent python code completed in less than 3 seconds.

for _ in 1:1000000
    i = rand(0:50)
    hash(Tuple([Tuple(1:i),Tuple(i+1:50)]))
end

Does there exist a consistently performant way to store tuples of tuples in a set?

Skoffer · December 22, 2020, 10:52am

Not an answer to your question, but this usage of tuples looks overly complicated. Why not just

 if (p1, p2) in seen
    return score(p1),1
end        
push!(seen, (p1, p2))

It is possible, that Tuple([Tuple(p1), Tuple(p2)]) construction creates intermediate arrays.

kristoffer.carlsson · December 22, 2020, 10:55am

Are you sure it is the hashing that is slow or could it be the fact that you are creating a bunch of tuples of unknown length (type unstable) and an array holding these two tuples in every iteration?

simeonschaub · December 22, 2020, 10:59am

The problem with your code is not the call to hash, but the type instabilities from creating tuples with non-inferable length. Just replacing Tuple([Tuple(p1), Tuple(p2)]) with (Tuple(p1), Tuple(p2)) already makes this a lot faster:

julia> @time for _ in 1:1000000
           i = rand(0:50)
           hash((Tuple(1:i), Tuple(i+1:50)))
       end
  2.541987 seconds (5.96 M allocations: 1.371 GiB, 2.92% gc time)

julia> @time for _ in 1:1000000
           i = rand(0:50)
           hash(Tuple([Tuple(1:i),Tuple(i+1:50)]))
       end
 11.292196 seconds (13.20 M allocations: 2.993 GiB, 1.20% gc time)

Tuple(1:rand(0:50)) is still not type stable though, so I would highly recommend storing the ranges as UnitRanges instead of Tuples inside your set.

rdl · December 22, 2020, 11:05am

I tried something like that as well but still ran into the same problems. I should note that p1,p2 are arrays and so I have to convert them into tuples. The variations I tried are

    tuple(Tuple(p1),Tuple(p2))
    tuple(tuple(p1...),tuple(p2...))
    (tuple(p1...),tuple(p2...))
    (Tuple(p1),Tuple(p2))
    Tuple([Tuple(p1),Tuple(p2)])
    Tuple([tuple(p1...),tuple(p2...)])

But all were slow.

kristoffer.carlsson · December 22, 2020, 11:08am

You don’t though. You can have a Set{NTuple{2, Vector{Int}}} and just put the two arrays in there (like what was said in Hashing tuples of tuples is seemingly very slow - #2 by Skoffer). That’s e.g. what I did for advent of code. Just don’t mutate them after you put them into the set.

rdl · December 22, 2020, 11:15am

Sorry about my confusion over what was going slow. I changed the code to what’s shown below and it now runs in ~2.8 seconds. Thanks for your help.

    if (p1,p2) in prev1
        return score(p1),1
    end        
    push!(prev1,(copy(p1),copy(p2)))