Reduce allocations when extracting values from NamedTuples

Hello,

I have a situation where I need to extract values from NamedTuples which are embedded within structs. Unfortunately, there are a lot of allocations, and it is not type stable. I have created a MWE below.

mutable struct M{T} 
    x::T
end
ms = [M((;a=rand(), b=rand(1:2))) for _ ∈ 1:10]

get_x(m) = m.x 
get_values(ms, k) = map(m -> get_x(m)[k], ms)
@time get_values(ms, :a)

(21 allocations: 624 bytes)

Not that the value of k in get_values is not known head of time. However, the values associated with a given key in the NamedTuple are the same type.

Thanks!

Not answering your question, but would it be an alternative to use StructArrays.jl?

Example:

using StructArrays

mutable struct M2
    a::Float64
    b::Int64
end

sa = StructArray{M2}(a=rand(10), b=rand(1:2,10))

sa.a     #  1.800 ns (0 allocs: 0 bytes)
sa.b
1 Like

Thanks for the reply. I will need to look into StructArrays in more detail, but unfortunately it would not be easy to restructure the data in this case.

The following loop function seems to work better in terms of allocations, but its still not type stable.

function loop(ms, k)
    n = length(ms)
    T = typeof(ms[1].x[k])
    output = Vector{T}(undef,n)
    for i ∈ 1:n 
        output[i] = get_x(ms[i])[k]
    end
    return output 
end

Note that to transform your original data to a structured array, you could do:

nt_vec = map(m -> (;a,b) = m.x, ms)
sa = StructArray(nt_vec)

I think this function is inherently type unstable since the return type indeed depends on the value (not the type) of its arguments.

To make it type stable you’d have to turn the field value into an argument type, for example:

get_values2(ms, ::Val{Field}) where Field = map(m -> get_x(m)[Field], ms)

julia> @time get_values(ms, :a);
  0.000010 seconds (21 allocations: 624 bytes)

julia> @time get_values2(ms, Val(:a));
  0.000006 seconds (1 allocation: 144 bytes)
6 Likes

@sijo, thank you for the recommendation. This looks like a promising approach given the constraints of the problem.

1 Like

you could also try with Struct Arrays as suggested by @rafael.guerra

julia> @btime StructArray((getfield(msi,1) for msi in ms))
  181.418 ns (4 allocations: 336 bytes)
10-element StructArray(::Vector{Float64}, ::Vector{Int64}) with eltype NamedTuple{(:a, :b), Tuple{Float64, Int64}}:
 (a = 0.5414842871372272, b = 2)
 (a = 0.9336051119621631, b = 1)
 (a = 0.09071290553043099, b = 1)
 (a = 0.7366230642086795, b = 2)
 (a = 0.2540805237857747, b = 2)
 (a = 0.7800245529522338, b = 1)
 (a = 0.39687011495765945, b = 2)
 (a = 0.4032879072086486, b = 2)
 (a = 0.33832481186412533, b = 1)
 (a = 0.09112718397551578, b = 2)

julia> @time getproperty(sams,1)
  0.000004 seconds
10-element Vector{Float64}:
 0.5414842871372272
 0.9336051119621631
 0.09071290553043099
 0.7366230642086795
 0.2540805237857747
 0.7800245529522338
 0.39687011495765945
 0.4032879072086486
 0.33832481186412533
 0.09112718397551578

julia> @time getproperty(sams,:a)
  0.000019 seconds (1 allocation: 32 bytes)
10-element Vector{Float64}:
 0.5414842871372272
 0.9336051119621631
 0.09071290553043099
 0.7366230642086795
 0.2540805237857747
 0.7800245529522338
 0.39687011495765945
 0.4032879072086486
 0.33832481186412533
 0.09112718397551578
julia> @btime getproperty($sams,findfirst(==(:a),propertynames($sams)))
  1.500 ns (0 allocations: 0 bytes)
10-element Vector{Float64}:
 0.5414842871372272
 0.9336051119621631
 0.09071290553043099
 0.7366230642086795
 0.2540805237857747
 0.7800245529522338
 0.39687011495765945
 0.4032879072086486
 0.33832481186412533
 0.09112718397551578


julia> @btime getproperty($sams,:a)
  1.500 ns (0 allocations: 0 bytes)
10-element Vector{Float64}:
 0.5414842871372272
 0.9336051119621631
 0.09071290553043099
 0.7366230642086795
 0.2540805237857747
 0.7800245529522338
 0.39687011495765945
 0.4032879072086486
 0.33832481186412533
 0.09112718397551578
2 Likes

is just: sams.a

I know (although I couldn’t figure out when and how in the construction of the structure ( (*)the array of named tuples is transformed into a named tuple of arrays).
But the OP asked for a solution in which the field is dynamic and therefore the form sams.a is limiting compared to field=:a getproperty(sams,field)

(*) I would be very grateful to anyone who explained in detail where this step takes place.

3 Likes

The transformation really starts here:

Line 51 creates destination arrays of the right types (how this is done is hard to follow especially in your case where the StructArray is initialized from a generator rather than a vector). The call to _collect_structarray! fills these destinations with the values from the tuples.

julia> using StructArrays   #, BenchmarkTools

julia> mutable struct M{T}
           x::T
       end

julia> ms = [M((;a=rand(), b=rand(1:2))) for _ ∈ 1:10]
10-element Vector{M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}}:
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.28755525874520704, b = 2))     
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.48459762010752394, b = 2))     
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.8595309903519835, b = 2))      
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.9999199980992112, b = 2))      
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.07601872388828879, b = 1))     
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.9043072853815545, b = 2))      
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.7170560150210613, b = 1))      
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.36632931891866216, b = 1))     
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.005976654462236608, b = 2))    
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.9097792660807252, b = 1))      

julia> StructArrays.staticschema(::Type{M{NamedTuple{names, types}}}) where {names, types} = NamedTuple{names, types}

julia> StructArrays.component(m::M, key::Symbol) = getfield(getfield(m, 1), key)      

julia> StructArrays.createinstance(::Type{M{T}},  args...) where T = M(T(args))       

julia> sa=StructArray(ms)
10-element StructArray(::Vector{Float64}, ::Vector{Int64}) with eltype M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}:
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.28755525874520704, b = 2))
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.48459762010752394, b = 2))
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.8595309903519835, b = 2))
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.9999199980992112, b = 2))
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.07601872388828879, b = 1))
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.9043072853815545, b = 2))
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.7170560150210613, b = 1))
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.36632931891866216, b = 1))
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.005976654462236608, b = 2))
 M{NamedTuple{(:a, :b), Tuple{Float64, Int64}}}((a = 0.9097792660807252, b = 1))

Thank you.
It seems plausible to me that it is in this function that the “initialization” of the “components” of the structarray occurs.

I tried using the instructions in the package documentation in the “advanced examples” paragraph.
I tried navigating with the debugger and various breakpoints, but I couldn’t get to those functions… the debugger fails on this function at the if @generated statement

function map_params_as_tuple(f::F, ::Type{T}) where {F, T<:Tup}
    if @generated
        types = fieldtypes(T)
        args = map(t -> :(f($t)), types)
        Expr(:tuple, args...)
    else
        map_params_as_tuple_fallback(f, T)
    end
end

Yeah it’s difficult to find the relevant code using Debugger. I think the sl command should make it easy but it crashes (I opened an issue here). So instead you need to step-into the same line several times, then step out each time it goes into the argument construction code. It’s easy to take a wrong turn (I think it’s a bit better in VS Code because for some reason when stepping into the argument constructions it doesn’t really go there so one doesn’t get lost).

Using Cthulhu works better but it only shows the function calls so it’s harder to use for navigating the source.

What I actually used is the poor man’s debugger: run the code manually line by line. It’s not so bad now with the “contextual module REPL” feature added in Julia 1.9:

  1. In the REPL, type StructArrays then Alt-m to enter the StructArrays module, then:

    (StructArrays) julia> mutable struct M{T}
                                     x::T
                                 end
    
    (StructArrays) julia> ms = [M((;a=rand(), b=rand(1:2))) for _ ∈ 1:10]
    
    julia> @edit StructArray((getfield(msi,1) for msi in ms))
    

    In the editor I see it’s going to call collect_structarray(v; initializer = StructArrayInitializer(unwrap)) with v=ms and unwrap = alwaysfalse so I run:

    (StructArrays) julia> v=ms; unwrap = alwaysfalse
    
    (StructArrays) julia> @edit collect_structarray(v; initializer = StructArrayInitializer(unwrap))
    

    etc. Cumbersome but it works and you have the full power of the REPL to experiment while digging in…