Type Inference of many dynamically created NamedTuples

Hi,

I am facing a compilation bottleneck in my code because I am creating and propagating a large collection of NamedTuples, each with different keys. This seems to cause type inference problems.

I have created the following example to illustrate the problem:

function make_number(nt)
    number = 0.
    for n in keys(nt)
        number += nt[n] + rand()
    end
    return number
end

function bad_loop(variants)
    numbers = Vector{Float64}(undef, size(variants, 1))
    for (index, v) in enumerate(variants)
        nt = NamedTuple{(v,)}([rand()])
        numbers[index] = make_number(nt)
    end
    return numbers
end

variants = [Symbol("rs", i) for i in rand(Int, 100)]
@time bad_loop(variants)
@time bad_loop(variants)

Is there any way to keep using NamedTuples without hurting performance? I can’t seem to find a way to do it. I think I could use a dictionary to solve this problem but my whole code base depends on NamedTuples…

Many thanks

Specific advice would require knowledge of your code/overall design, but the main thing to keep in mind is not to move values into the type domain in hot loops. This moves the value v, which is not a compile-time constant, into the type domain.

In case you don’t know, moving a value into the type domain is a great power of Julia, not available in other languages. But it comes at a potentially high cost - invoking the compiler (at run time).

Another issue is that the variable nt changes type from loop iteration to loop iteration. This can be solved in some cases, by using recursion instead of a loop. But I don’t know how much of a redesign would that require in your case.

Thank you for your reply @nsajko . I think I understand from what you say that even though enabled by Julia, dynamicaly created NamedTuples (in particular with new names) will lead to compilation performance problems. There is little I can do about it but change my code base to get rid of the NamedTuples altogether?

No. I’m saying don’t do it inside a hot loop. I.e., do it only rarely enough.

Right but in my use case I am dealing with hundreds of thousands of such cases and the variant may be a tuple of keys instead of a single key. How do I practically do it?

For instance this does not help:

function bad_loop_2(variants)
    numbers = Vector{Float64}(undef, size(variants, 1))
    nts = [NamedTuple{(v,)}([rand()]) for v in variants]
    for (index, nt) in enumerate(nts)
        numbers[index] = make_number(nt)
    end
    return numbers
end

As I said:

Do you know about the Performance tips page in the Julia Manual?

https://docs.julialang.org/en/v1/manual/performance-tips/

Yes I know about it, that’s how I figured out the problem was the creation of new types (specifically Performance Tips · The Julia Language) in the for loop but I can’t figure out a way to solve it using these docs.

The specific content of the for loop is:

for variant in variants
        treatments = treatments_from_variant(variant, dataset) # This is the NamedTuple Creation from a datframe
        Ψ = factorialEstimands(
        estimand_constructor, treatments, outcomes; 
        confounders=confounders, 
        dataset=dataset,
        outcome_extra_covariates=outcome_extra_covariates,
        positivity_constraint=positivity_constraint, 
        verbosity=verbosity-1
    )
end

where factorialEstimand points to: TMLE.jl/src/counterfactual_mean_based/estimands.jl at 5d4a8e95711abfaabde310239026c6929cc8d270 · TARGENE/TMLE.jl · GitHub
This creates a struct that contains NamedTuples as fields, again probably a bad design but I never suspected this would lead to such dramatic performance problems.