Unroll setfield!

vchuravy · November 14, 2024, 9:10am

I often use the pattern:

ntuple(Val(N)) do i
    @inline
    # ...
end

The @inline is important since otherwise you gain little. The only other thing to be careful is to avoid closure-capture.

Albert_de_montserrat · November 14, 2024, 10:39am

This alternative @generated function is (IMO) simpler, doesn’t require Val, and is x2 faster on my machine

@generated function setup_data_gen2!(data, field_names, field_values, field_types::NTuple{N}) where N
    quote
        @inline
        Base.@nexprs $N i-> setfield!(data, field_names[i], convert(field_types[i], field_values[i]))
    end
end

julia> @btime setup_data_gen2!($(Data32()), $field_names, $field_values, $field_types)       
  1.700 ns (0 allocations: 0 bytes)

Mason · November 14, 2024, 10:56am

Mhm, probably I was just accidentally causing a closure capture then when I was having a bad experience with ntuple(f, ::Val). I had assumed the compiler was having some problem with the Tuple being created being too long, but it wasn’t even very long, so this sounds like a more likely explanation.

lmiq · November 14, 2024, 12:42pm

Nice. I got the same performance, though.

Edit: and I did need the Val trick to make it type stable.

Albert_de_montserrat · November 14, 2024, 1:14pm

You are right, had some mixed stuff in my interactive session. It does indeed allocate without the Val trick.

danielwe · November 14, 2024, 4:57pm

You still gain even without inlining if the i is constpropped, no? (EDIT: Unless you’re capturing a typevar, as discussed.) Which it’s likely to be, given that it’s a literal in the unrolled code.

lmiq · November 16, 2024, 9:51pm

Unfortunately I noticed an inference regression in 1.11.1 with the current solutions proposed.

These are the functions (recursive or generated as options):

function _fast_setfield!(atom::AtomType, field_values::FIELDS, inds_and_names::TUPTUP) where {AtomType, FIELDS, TUPTUP} 
    setfield_recursive!(atom, field_values, inds_and_names)
# Alternative with generated function:
#    N = length(inds_and_names)
#    setfield_generated!(atom, field_values, inds_and_names, Val(N))
end

import PDBTools: _parse
# Alternate values for fields that might be empty
_alt(::Type{S}) where {S<:AbstractString} = S("X")
_alt(::Type{T}) where {T} = zero(T)
# Unwrap Val-wrapped values
unwrap(::Val{T}) where {T} = T
function setfield_recursive!(atom::AtomType, field_values::FIELDS, inds_and_names::TUPTUP) where {AtomType, FIELDS, TUPTUP}
    isempty(inds_and_names) && return atom
    i, valfield = first(inds_and_names)
    field = unwrap(valfield)
    T = typeof(getfield(atom, field))
    setfield!(atom, field, _parse(T, field_values[i]; alt=_alt(T)))
    setfield_recursive!(atom, field_values, Base.tail(inds_and_names))
end

# Alternative implementation using generated functions (same peformance as far as tested)
# https://discourse.julialang.org/t/unroll-setfield/122545/22?u=lmiq
@generated function setfield_generated!(atom, field_values::FIELDS, inds_and_names::TUPTUP, ::Val{N}) where {FIELDS,TUPTUP,N}
    quote
        @inline
        Base.@nexprs $N i -> begin
            ifield, valfield = inds_and_names[i]
            field = unwrap(valfield)
            T = typeof(getfield(atom, field))
            setfield!(atom, field, _parse(T, field_values[ifield]; alt=_alt(T)))
        end
    end
end

And this is how to run the tests:

    # All this to generate the data to run the above functions.
    import Pkg; Pkg.add(url="https://github.com/m3g/PDBTools", rev="mmCIF")
    using PDBTools: Atom, _fast_setfield!
    record = "ATOM   1    N  N   . VAL A 1 1   ? 6.204   16.869  4.854   1.00 49.05 ? 1   VAL A N   1"
    inds_and_names = ((2, Val{:index_pdb}()), (4, Val{:name}()), (6, Val{:resname}()), (7, Val{:chain}()), (9, Val{:resnum}()), (11, Val{:x}()), (12, Val{:y}()), (13, Val{:z}()), (14, Val{:occup}()), (15, Val{:beta}()), (16, Val{:charge}()), (17, Val{:resnum}()), (18, Val{:resname}()), (19, Val{:chain}()), (20, Val{:name}()), (21, Val{:model}()))
    atom = Atom()
    NCOLS = 21
    field_values = NTuple{NCOLS}(eachsplit(record))
    # Finally, what matters:
    @btime _fast_setfield!($atom, $field_values, $inds_and_names)

On 1.10 I get 0 allocations, while on 1.11.1 I get 6.

Is anything obvious that I can improve in the functions to avoid that? (or would it be important to trim the example down to report a regression here?)

Thanks.

danielwe · November 17, 2024, 1:39am

Not sure if this is the culprit for the regression, but this line should probably be replaced with T = fieldtype(AtomType, field), unless the AtomType struct has abstractly typed fields (and if it does, you have to expect some allocations here anyway as the field values must be boxed).

This is especially important if it’s possible to construct an AtomType instance without initializing all its fields, in which case getfield might error.

danielwe · November 17, 2024, 2:21am

Turns out your allocations on 1.11 are in your _parse function when the second argument is a SubString rather than a String. They’re not related to looping/recursing over the tuple.

julia> @btime _parse($Int32, $("1"); alt=_alt($Int32));
  166.978 ns (0 allocations: 0 bytes)

julia> @btime _parse($Int32, $(@view "1"[1:1]); alt=_alt($Int32));
  186.245 ns (1 allocation: 32 bytes)

Specifically, it’s the call to findlast that apparently involves a runtime dispatch, as shown in the following allocation profile, made using ProfileCanvas.jl and the call

@profview_allocs _fast_setfield!(atom, field_values, inds_and_names) sample_rate=1.0

lmiq · November 17, 2024, 9:46am

It has a parametric type, but the instances are concrete. That’s why I used that.

That’s great! I was not aware of that tool. Thank you very much. I’ll work on that.

lmiq · November 17, 2024, 1:53pm

Strangely that´s not the issue. That specific call, isolated, also allocates on 1.10. But it seems to be inlined, constant propagated (or whatever) when the _fast_setfield! function is called (and the result is no allocations there).

Still, the same allocates in 1.11.1.

Iˋll keep investigating.

edit: Despite the fact that allocations are greater, in 1.11 it seems to be faster:

1.10:

@time read_mmcif("/home/leandro/Documents/perilla/all.cif")
115.154869 seconds (129.21 M allocations: 13.191 GiB, 13.05% gc time)
   Array{Atoms,1} with 64423983 atoms with fields:

1.11:

julia> @time read_mmcif("/home/leandro/Documents/perilla/all.cif")
101.243284 seconds (387.27 M allocations: 22.171 GiB, 14.08% gc time)
   Array{Atoms,1} with 64423983 atoms with fields:

So maybe this is something related to the new Memory type and how it handles allocations (I´ve experienced more than one case with allocations going up while performance improved because of that).

edit: Now I changed that to remove those allocations in 1.11 (by removing ˋfindlastˋ calls, as indicated by @danielwe), and now I it is even faster in 1.11:

@time read_mmcif("/home/leandro/Documents/perilla/all.cif")
 89.377302 seconds (129.58 M allocations: 14.491 GiB, 12.45% gc time)
   Array{Atoms,1} with 64423983 atoms with fields:

Problem solved for me (I don´t know about the inference regression there, which existed, I might com with a MWE asap). Thanks again for all the help.

Topic		Replies	Views
Looping over struct fieldnames in a type-stable way Performance	3	210	August 5, 2023
Correct typing of struct field with late initialization General Usage struct	2	407	February 2, 2022
Allocations when using getfield with a tuple/vector of symbols Performance memory-allocation , struct	12	195	November 20, 2024
Copy of immutable with multiple fields changed General Usage question	3	929	October 10, 2019
For loop on a structure's field names New to Julia	11	3432	April 15, 2020

Unroll setfield!

Related topics