Please help: Poor performance, what am I missing?

jocklawrie · October 25, 2022, 12:03am

Hello clever people,

I’m having trouble dissecting a performance problem.

MWE below. The computation in the 3rd benchmark is the composition of the computations in the first 2 benchmarks. I expect the 3rd benchmark to take about the total time taken by the first 2 benchmarks. But it’s 2x as long and there’s an unexpected allocation. Clearly I’m missing something fundamental. What’s happening here?


using BenchmarkTools

"Sum the numeric values, count the non-numeric values"
function processcollection(t)
    total = 0.0
    ncat  = 0
    for x in t
        total, ncat = processvalue(x, total, ncat)
    end
    total, ncat
end

processvalue(x::Real, total, ncat) = total + x, ncat
processvalue(x, total, ncat) = total, ncat + 1

d = Dict("a" => (1,2), "b" => ("a", "b", "c"), "c" => (1.1, 2.2, 3.3, 4.4, 5.5, 6.6))

k = "c"
v = d[k]
@benchmark $d[$k]                     # 20ns,  0 bytes
@benchmark processcollection($v)      #  2ns,  0 bytes
@benchmark processcollection($d[$k])  # 45ns, 32 bytes, 1 alloc

adienes · October 25, 2022, 12:23am

If you move the lvalues into a function

function main()
    d = Dict("a" => (1,2), "b" => ("a", "b", "c"), "c" => (1.1, 2.2, 3.3, 4.4, 5.5, 6.6))
    k = "c"
    @benchmark processcollection($d[$k])
end
main()

I get 27ns

stevengj · October 25, 2022, 12:36am

In general this will still perform a dynamic dispatch — even if the types of d and k are known by the compiler (because you interpolated them), it doesn’t know the type of d[k] (because your dictionary is heterogeneous).

jocklawrie · October 25, 2022, 1:24am

Doesn’t seem to help - I’m still getting 45ns on my machine.

jocklawrie · October 25, 2022, 1:26am

Thanks Steven.
I still don’t get why the 3rd benchmark is slower than the sum of the first two. The dynamic dispatch should happen in both the 1st and the 3rd benchmarks right?

stevengj · October 25, 2022, 1:36am

No, the dynamic dispatch is determining (at runtime) which compiled method of processcollection to call based on the type of d[k], and this only happens in the third benchmark.

jocklawrie · October 25, 2022, 2:01am

Ah ok, so we’re essentially talking about the distinction between
@benchmark processcollection($v) and
@benchmark processcollection(v) , which use compile-time dispatch and run-time dispatch respectively, and the latter is the same as the 3rd benchmark.

Thanks again, most helpful.
Jock

Topic		Replies	Views
Dispatch time using `Val` is an order of magnitude slower than looking up with `Dict` Internals & Design	4	1035	May 5, 2017
Benchmarking Parallel Computing Tools General Usage multithreading , distributed	2	571	February 25, 2021
Performance when dispatching on type Performance	4	412	April 18, 2022
A question about how arrays work, how memory is allocated and what happen when chunks of code inside a function are moved into another function Performance	10	387	May 13, 2022
Intriguing performance comparison results.... threads, comprehensions, etc Performance	11	320	January 23, 2025

Please help: Poor performance, what am I missing?

Related topics