Type Inference of many dynamically created NamedTuples

olivierlabayle · July 25, 2024, 10:47am

Hi,

I am facing a compilation bottleneck in my code because I am creating and propagating a large collection of NamedTuples, each with different keys. This seems to cause type inference problems.

I have created the following example to illustrate the problem:

function make_number(nt)
    number = 0.
    for n in keys(nt)
        number += nt[n] + rand()
    end
    return number
end

function bad_loop(variants)
    numbers = Vector{Float64}(undef, size(variants, 1))
    for (index, v) in enumerate(variants)
        nt = NamedTuple{(v,)}([rand()])
        numbers[index] = make_number(nt)
    end
    return numbers
end

variants = [Symbol("rs", i) for i in rand(Int, 100)]
@time bad_loop(variants)
@time bad_loop(variants)

Is there any way to keep using NamedTuples without hurting performance? I can’t seem to find a way to do it. I think I could use a dictionary to solve this problem but my whole code base depends on NamedTuples…

Many thanks

nsajko · July 25, 2024, 11:00am

Specific advice would require knowledge of your code/overall design, but the main thing to keep in mind is not to move values into the type domain in hot loops. This moves the value v, which is not a compile-time constant, into the type domain.

In case you don’t know, moving a value into the type domain is a great power of Julia, not available in other languages. But it comes at a potentially high cost - invoking the compiler (at run time).

Another issue is that the variable nt changes type from loop iteration to loop iteration. This can be solved in some cases, by using recursion instead of a loop. But I don’t know how much of a redesign would that require in your case.

olivierlabayle · July 25, 2024, 12:24pm

Thank you for your reply @nsajko . I think I understand from what you say that even though enabled by Julia, dynamicaly created NamedTuples (in particular with new names) will lead to compilation performance problems. There is little I can do about it but change my code base to get rid of the NamedTuples altogether?

nsajko · July 25, 2024, 12:27pm

No. I’m saying don’t do it inside a hot loop. I.e., do it only rarely enough.

olivierlabayle · July 25, 2024, 12:33pm

Right but in my use case I am dealing with hundreds of thousands of such cases and the variant may be a tuple of keys instead of a single key. How do I practically do it?

For instance this does not help:

function bad_loop_2(variants)
    numbers = Vector{Float64}(undef, size(variants, 1))
    nts = [NamedTuple{(v,)}([rand()]) for v in variants]
    for (index, nt) in enumerate(nts)
        numbers[index] = make_number(nt)
    end
    return numbers
end

nsajko · July 25, 2024, 12:35pm

As I said:

Do you know about the Performance tips page in the Julia Manual?

olivierlabayle · July 25, 2024, 12:44pm

Yes I know about it, that’s how I figured out the problem was the creation of new types (specifically Performance Tips · The Julia Language) in the for loop but I can’t figure out a way to solve it using these docs.

The specific content of the for loop is:

for variant in variants
        treatments = treatments_from_variant(variant, dataset) # This is the NamedTuple Creation from a datframe
        Ψ = factorialEstimands(
        estimand_constructor, treatments, outcomes; 
        confounders=confounders, 
        dataset=dataset,
        outcome_extra_covariates=outcome_extra_covariates,
        positivity_constraint=positivity_constraint, 
        verbosity=verbosity-1
    )
end

where factorialEstimand points to: TMLE.jl/src/counterfactual_mean_based/estimands.jl at 5d4a8e95711abfaabde310239026c6929cc8d270 · TARGENE/TMLE.jl · GitHub
This creates a struct that contains NamedTuples as fields, again probably a bad design but I never suspected this would lead to such dramatic performance problems.

Topic		Replies	Views
Why is the NamedTuple slower? When/How would it be faster? Is it still allocated on the stack? Performance	6	245	February 14, 2025
Named Tuple Constructor type unstable? General Usage	7	1440	April 24, 2019
NamedTuple type is unstable, or my usage is wrong? Performance question	5	1024	December 12, 2020
Looping through NamedTuple is slow General Usage	7	249	July 8, 2024
I want to replicate with Named Tuples what I do with Dictionaries but I can't General Usage dictionary , namedtuple	9	2107	November 9, 2021

Type Inference of many dynamically created NamedTuples

Related topics