Get first match in a dictionary of tuples

tuckermcclure · September 28, 2017, 6:00am

I have a dictionary of tuples. Inside one of those tuples is one thing I care about. What’s the best way to find that one thing? How can I make it fast? I feel like I have some things to learn about associative collections.

Here’s a mockup of my code, where I have a dictionary of pet owners and their pets. I know there’s one and only one Cat, and I’d like to find that cat. I have some code to do it, but it feels really bulky compared to the rest of my code.

abstract type Pet end
struct Dog <: Pet; name::String; end
struct Cat <: Pet; name::String; end
struct Ox  <: Pet; name::String; end

# Create an example dictionary.
pets = Dict("Cindy" => (Dog("Fido"), Cat("Captain Snuggly Pants")),
            "Tom"   => (Ox("Socks"), Dog("Chewbacca")))

# Find the cat.
cat = Cat("empty cat") # Defined so that cat has global scope.
for these_pets in values(pets) # Get tuple of pets for each entry.
    cats = filter(e -> isa(e, Cat), these_pets) # Look for a Cat.
    if !isempty(cats) # If we found one, accept it and break.
        cat = first(cats)
        break
    end
end

I can definitely make the code shorter by vcating all of those tuples together into a literal collection and then calling first(filter(p -> isa(p, Cat), ...)) on it, but that seems hugely wasteful. Time is critical here; there will be many pet owners and many pets, the whole thing will need to run many times on different collections, and who wouldn’t want to find Captain Snuggly Pants as soon as possible?

DNF · September 28, 2017, 8:02am

How about (cat is a function in base, so I just changed it to kat):

kat = Cat("Nemo")
for val in values(pets)
    katind = findfirst(v->isa(v, Cat), val)
    !iszero(katind) && (kat = val[katind]; break)
end

Timing:

julia> @btime begin
       kat = Cat("Nemo")
       for val in values($pets)
           katind = findfirst(v->isa(v, Cat), val)
           !iszero(katind) && (kat = val[katind]; break)
       end
       end
  203.837 ns (0 allocations: 0 bytes)

Tamas_Papp · September 28, 2017, 8:02am

Possibly something like

function first_matching(pred, iter)
    state = start(iter)
    while !done(iter, state)
        item, next_state = next(iter, state)
        pred(item) && return Nullable(item)
        state = next_state
    end
    return Nullable(state, false)
end

first_matching(p -> isa(p, Cat), Base.Iterators.flatten(values(pets)))

tuckermcclure · September 28, 2017, 4:08pm

Thanks @DNF and @Tamas_Papp!

As written, @DNF’s answer is by far the fastest. However, it has more hard-coded stuff than @Tamas_Papp’s. Modifying Papp’s so as to compare apples to apples (building in the flatten and isa(...) and leaving out the Nullables), I have:

function first_matching(pets, pet_type)
    iter = Base.Iterators.flatten(pets)
    state = start(iter)
    while !done(iter, state)
        item, next_state = next(iter, state)
        isa(item, pet_type) && return item
        state = next_state
    end
    return pet_type("[empty pet]")
end

and in fact this modified version is the fastest so far. It comes in at 26us, where’s @DNF’s comes in at 32us. So you both contributed to the fastest answer. (My original one comes in at a terrifying 700us and is sufficient evidence that I am still a beginner in Julia.)

Interestingly (to me), the flatten call has essentially no cost. I wrote another version where instead of using flatten, I used a loop for iter in pets; (same as above); end;, and this saved no time.

Thanks again, all. This is going to be one of the functions called most frequently by my program, and speed is critical!

DNF · September 28, 2017, 4:44pm

That’s odd. I timed mine at 200 nanoseconds.

ggggggggg · September 28, 2017, 5:06pm

I think this is a cleaner version of your answer, and I time it at about 1.3 us.

function first_matching_flatten(pets, PetType)
    for pet in Base.Iterators.flatten(pets)
        pet isa PetType && return pet
    end
    return PetType("none")
end
@btime first_matching_flatten($pets, Cat)

As for DNF’s example, I get roughly 200ns as well, but if I turn it into a function it is much slower 1.6 us instead. I don’t understand what is going on there.

println("DNF")
@btime begin
       kat = Cat("Nemo")
       for val in values($pets)
           katind = findfirst(v->isa(v, Cat), val)
           !iszero(katind) && (kat = val[katind]; break)
       end
end
println("DNF function")
function first_matching_dnf(pets, PetType)
    kat = PetType("none")
    for val in values(pets)
        katind = findfirst(v->isa(v, Cat), val)
        !iszero(katind) && (kat::PetType = val[katind]; break)
    end
    kat
end
@btime first_matching_dnf($pets, Cat)

In case you don’t know @btime comes from BenchmarkTools.jl and will give different timings from @time. The BenchmarkTools timings will generally avoid penalties associated with global scope.

tuckermcclure · September 28, 2017, 5:11pm

Thanks for the @btime heads up. In fact, I didn’t know and had just been using @time.

I would definitely be curious to understand why your timing changes when DNF’s answer is made into a function. I would think it would be faster as a function. Maybe something having to do with pets being Any in the function?

tuckermcclure · September 28, 2017, 5:47pm

Using @btime from the BenchmarkTools, I see that there is a cost to that flatten and that this is in fact fastest so far:

function first_pet_of_type_3(pets, pet_type)
    for these_pets in values(pets)
        for pet in these_pets
            if pet isa pet_type
                return pet
            end
        end
    end
    return pet_type("[no pet]")
end

I’m getting 1.2us for this. Replacing the outer loop with flatten(values(pets)) makes the function take 4.5us.

DNF · September 28, 2017, 6:50pm

My timings are decidedly different:

function findcat(pets)
    for val in values(pets)
        katind = findfirst(v->isa(v, Cat), val)
        !iszero(katind) && return val[katind]
    end
    return Cat("Nemo")
end

julia> @btime findcat($pets)
  185.125 ns (0 allocations: 0 bytes)
Cat("Captain Snuggly Pants")

julia> @btime first_pet_of_type_3($pets, Cat)
  1.741 μs (4 allocations: 128 bytes)
Cat("Captain Snuggly Pants")

DNF · September 28, 2017, 7:00pm

BTW, I thought you wanted something less bulky in terms of code length, and presumed it would be inline, hence the hardcoding.

tuckermcclure · September 28, 2017, 7:35pm

@DNF, right on all counts. Especially when the type (Cat) is hard-coded, the code is fastest. Making the type an input slows it down, but my tests are now showing your function above to be the fastest (twice as fast as it was earlier). Not sure what changed.

DNF · September 28, 2017, 7:38pm

One thing I changed is to not pre-assign the output with the default (Cat("Nemo")). Then, if a cat is actually found, you avoid that assignment.

Topic		Replies	Views
First matching element in collection General Usage question	3	456	December 5, 2018
"Simpler" findfirst/firstall methods? Internals & Design	16	835	March 1, 2021
Fastest way to do memoization? Performance performance	11	682	February 23, 2023
Findfirst that returns the value/element New to Julia	16	3846	January 23, 2018
Why does findfirst(==(T), ...) on a tuple of typed only constant fold for the first? Performance	2	630	September 29, 2021

Get first match in a dictionary of tuples

Related topics