I have a dictionary of tuples. Inside one of those tuples is one thing I care about. What’s the best way to find that one thing? How can I make it fast? I feel like I have some things to learn about associative collections.
Here’s a mockup of my code, where I have a dictionary of pet owners and their pets. I know there’s one and only one Cat, and I’d like to find that cat. I have some code to do it, but it feels really bulky compared to the rest of my code.
abstract type Pet end
struct Dog <: Pet; name::String; end
struct Cat <: Pet; name::String; end
struct Ox <: Pet; name::String; end
# Create an example dictionary.
pets = Dict("Cindy" => (Dog("Fido"), Cat("Captain Snuggly Pants")),
"Tom" => (Ox("Socks"), Dog("Chewbacca")))
# Find the cat.
cat = Cat("empty cat") # Defined so that cat has global scope.
for these_pets in values(pets) # Get tuple of pets for each entry.
cats = filter(e -> isa(e, Cat), these_pets) # Look for a Cat.
if !isempty(cats) # If we found one, accept it and break.
cat = first(cats)
break
end
end
I can definitely make the code shorter by vcating all of those tuples together into a literal collection and then calling first(filter(p -> isa(p, Cat), ...)) on it, but that seems hugely wasteful. Time is critical here; there will be many pet owners and many pets, the whole thing will need to run many times on different collections, and who wouldn’t want to find Captain Snuggly Pants as soon as possible?
How about (cat is a function in base, so I just changed it to kat):
kat = Cat("Nemo")
for val in values(pets)
katind = findfirst(v->isa(v, Cat), val)
!iszero(katind) && (kat = val[katind]; break)
end
Timing:
julia> @btime begin
kat = Cat("Nemo")
for val in values($pets)
katind = findfirst(v->isa(v, Cat), val)
!iszero(katind) && (kat = val[katind]; break)
end
end
203.837 ns (0 allocations: 0 bytes)
As written, @DNF’s answer is by far the fastest. However, it has more hard-coded stuff than @Tamas_Papp’s. Modifying Papp’s so as to compare apples to apples (building in the flatten and isa(...) and leaving out the Nullables), I have:
function first_matching(pets, pet_type)
iter = Base.Iterators.flatten(pets)
state = start(iter)
while !done(iter, state)
item, next_state = next(iter, state)
isa(item, pet_type) && return item
state = next_state
end
return pet_type("[empty pet]")
end
and in fact this modified version is the fastest so far. It comes in at 26us, where’s @DNF’s comes in at 32us. So you both contributed to the fastest answer. (My original one comes in at a terrifying 700us and is sufficient evidence that I am still a beginner in Julia.)
Interestingly (to me), the flatten call has essentially no cost. I wrote another version where instead of using flatten, I used a loop for iter in pets; (same as above); end;, and this saved no time.
Thanks again, all. This is going to be one of the functions called most frequently by my program, and speed is critical!
I think this is a cleaner version of your answer, and I time it at about 1.3 us.
function first_matching_flatten(pets, PetType)
for pet in Base.Iterators.flatten(pets)
pet isa PetType && return pet
end
return PetType("none")
end
@btime first_matching_flatten($pets, Cat)
As for DNF’s example, I get roughly 200ns as well, but if I turn it into a function it is much slower 1.6 us instead. I don’t understand what is going on there.
println("DNF")
@btime begin
kat = Cat("Nemo")
for val in values($pets)
katind = findfirst(v->isa(v, Cat), val)
!iszero(katind) && (kat = val[katind]; break)
end
end
println("DNF function")
function first_matching_dnf(pets, PetType)
kat = PetType("none")
for val in values(pets)
katind = findfirst(v->isa(v, Cat), val)
!iszero(katind) && (kat::PetType = val[katind]; break)
end
kat
end
@btime first_matching_dnf($pets, Cat)
In case you don’t know @btime comes from BenchmarkTools.jl and will give different timings from @time. The BenchmarkTools timings will generally avoid penalties associated with global scope.
Thanks for the @btime heads up. In fact, I didn’t know and had just been using @time.
I would definitely be curious to understand why your timing changes when DNF’s answer is made into a function. I would think it would be faster as a function. Maybe something having to do with pets being Any in the function?
Using @btime from the BenchmarkTools, I see that there is a cost to that flatten and that this is in fact fastest so far:
function first_pet_of_type_3(pets, pet_type)
for these_pets in values(pets)
for pet in these_pets
if pet isa pet_type
return pet
end
end
end
return pet_type("[no pet]")
end
I’m getting 1.2us for this. Replacing the outer loop with flatten(values(pets)) makes the function take 4.5us.
function findcat(pets)
for val in values(pets)
katind = findfirst(v->isa(v, Cat), val)
!iszero(katind) && return val[katind]
end
return Cat("Nemo")
end
@DNF, right on all counts. Especially when the type (Cat) is hard-coded, the code is fastest. Making the type an input slows it down, but my tests are now showing your function above to be the fastest (twice as fast as it was earlier). Not sure what changed.