Tuple indexing taking time?

Skoffer · March 31, 2021, 6:04pm

You can try map to unroll tuples correctly.

using BenchmarkTools
f1(x) = x^2
f2(x) = x^3
f3(x) = x^4
f4(x) = x^5
t = (f1, f2, f3, f4)

function g1(t, x)
    res = 0
    for i in 1:length(t)
        res += t[i](x)
    end
    return res
end

g2(t, x) = sum(map(f -> f(x), t))

julia> @btime g1($t, 2)
  190.059 ns (0 allocations: 0 bytes)
60

julia> @btime g2($t, 2)
  8.477 ns (0 allocations: 0 bytes)
60

markmbaum · March 31, 2021, 6:12pm

I see… I will see if that gives the same, or better, improvement. Any simple explanation of why mapping is efficient in this scenario?

Skoffer · March 31, 2021, 6:18pm

Naively speaking (and it’s really my mental model of how julia operates, it can be far away from reality), when you are going through the loop, compiler is unable to determine which function is going to be run next, so it has to determine it in runtime. Which means, that when you are executing program for each new value of i it pauses and tries to determine which function should be executed now. It looks it up with the help of big and complicated dictionary and it takes a lot of time.

On the other hand, when you are applyng map, due to the way this function is implemented, compiler is turning it’s call into something like this (f1(x), f2(x), f3(x), f4(x)), so it knows at compile time which function is going to be call when and it needs not to make this huge dynamical lookup.

rdeits · March 31, 2021, 6:31pm

You can also use recursion to iterate over tuples of varying types without losing type-stability, as discussed here: Manual dispatch over a constant type set - #4 by rdeits

Skoffer · March 31, 2021, 6:37pm

Oh, you are right, its rather amazing

function call_for_each(x::T, t) where T
    isempty(t) && return zero(T)
    return first(t)(x) + call_for_each(x, Base.tail(t))
end

julia> @btime call_for_each(2, $t)
  6.880 ns (0 allocations: 0 bytes)

It’s even faster than map method.

markmbaum · April 1, 2021, 2:51pm

Yep, @rdeits that appears to be the best solution so far. Using an organizing type helped and took the whole calculation down to 172 ms, but it’s clunky. The tuple recursion is pretty simple, takes it down to 140 ms, and the new profile is looking great.

Topic		Replies	Views
Fast type indexing Performance	6	609	March 10, 2019
Strange behavior with array calls General Usage question , performance	8	493	January 2, 2019
Looping through NamedTuple is slow General Usage	7	241	July 8, 2024
Vararg vs Tuple{Vararg} performance Performance	4	1207	March 7, 2020
Get NamedTuple element with String key. Why is this slow? Performance	3	640	March 30, 2022

Tuple indexing taking time?

Related topics