I am facing a compilation bottleneck in my code because I am creating and propagating a large collection of NamedTuples, each with different keys. This seems to cause type inference problems.
I have created the following example to illustrate the problem:
function make_number(nt)
number = 0.
for n in keys(nt)
number += nt[n] + rand()
end
return number
end
function bad_loop(variants)
numbers = Vector{Float64}(undef, size(variants, 1))
for (index, v) in enumerate(variants)
nt = NamedTuple{(v,)}([rand()])
numbers[index] = make_number(nt)
end
return numbers
end
variants = [Symbol("rs", i) for i in rand(Int, 100)]
@time bad_loop(variants)
@time bad_loop(variants)
Is there any way to keep using NamedTuples without hurting performance? I can’t seem to find a way to do it. I think I could use a dictionary to solve this problem but my whole code base depends on NamedTuples…
Specific advice would require knowledge of your code/overall design, but the main thing to keep in mind is not to move values into the type domain in hot loops. This moves the value v, which is not a compile-time constant, into the type domain.
In case you don’t know, moving a value into the type domain is a great power of Julia, not available in other languages. But it comes at a potentially high cost - invoking the compiler (at run time).
Another issue is that the variable nt changes type from loop iteration to loop iteration. This can be solved in some cases, by using recursion instead of a loop. But I don’t know how much of a redesign would that require in your case.
Thank you for your reply @nsajko . I think I understand from what you say that even though enabled by Julia, dynamicaly created NamedTuples (in particular with new names) will lead to compilation performance problems. There is little I can do about it but change my code base to get rid of the NamedTuples altogether?
Right but in my use case I am dealing with hundreds of thousands of such cases and the variant may be a tuple of keys instead of a single key. How do I practically do it?
For instance this does not help:
function bad_loop_2(variants)
numbers = Vector{Float64}(undef, size(variants, 1))
nts = [NamedTuple{(v,)}([rand()]) for v in variants]
for (index, nt) in enumerate(nts)
numbers[index] = make_number(nt)
end
return numbers
end
Yes I know about it, that’s how I figured out the problem was the creation of new types (specifically Performance Tips · The Julia Language) in the for loop but I can’t figure out a way to solve it using these docs.
The specific content of the for loop is:
for variant in variants
treatments = treatments_from_variant(variant, dataset) # This is the NamedTuple Creation from a datframe
Ψ = factorialEstimands(
estimand_constructor, treatments, outcomes;
confounders=confounders,
dataset=dataset,
outcome_extra_covariates=outcome_extra_covariates,
positivity_constraint=positivity_constraint,
verbosity=verbosity-1
)
end