Looping through NamedTuple is slow

Hi, i was comparing the performance between NamedTuple and Dict, and the results are very confusing.

In the following code, if the elements inside containers are accessed through a loop, the performance of NamedTuple is worse than Dict.

using BenchmarkTools

x = (a=1.0, b=1.0)
y = Dict(:a=>1.0, :b=>1.0)
indx = [:a, :b]

function func1(x, indx)
    for i in indx
        x[i]
    end
end

@btime func1($x, $indx) # 17.034 ns (0 allocations: 0 bytes)
@btime func1($y, $indx) # 9.300 ns (0 allocations: 0 bytes)

However, if we don’t use loop, the NamedTuple will have a huge performance gain. Could someone help me understand why this happens?

function func2(x)
    x[:a]
    x[:b]
end

@btime func2($x) # 2.700 ns (0 allocations: 0 bytes)
@btime func2($y) # 7.800 ns (0 allocations: 0 bytes)
1 Like

The difference is that in func2, the index is known at compile-time, but in func1 it is not.

Thank you! I’m curious why it matters much more for NamedTuple compared to Dict. The performance difference in func1 and func2 is quite minimal for the latter, but huge for NamedTuple.

My guess is that Dict is optimized for the case where indicies are not known at compile-time, but NamedTuple is optimized for the case where they are. So you can pick the one that suits your case.

1 Like

In func2(::NamedTuple) the compiler is probably able to figure out that x[:a] is never used, and the look-up has no side effects, so the function can directly return x[:b]. This makes the the difference seem bigger than it actually is.

When benchmarking, you should avoid situations where the compiler can “optimize away” your code. I’m not sure exactly what happens in this case, since I cannot run your code now, but it’s better to do something like

function func1(x, indx)
    s = 0.0  # or zero(eltype(x)) 
    for i in indx
        s += x[i]
    end
    return s  # important, return something observable
end

Then you force the function to do actual work.

Also

function func2(x)
    return x[:a] + x[:b]
end

Thanks! I redo the benchmark but still have the same results.

But even if the indexes are explicitly stated in func1, the results still don’t change.

function func1(x)
    s = 0.0 
    for i in (:a, :b)
        s += x[i]
    end
    return s
end

@btime func1($x) # 16.232 ns (0 allocations: 0 bytes)