Unroll setfield!

I often use the pattern:

ntuple(Val(N)) do i
    @inline
    # ...
end

The @inline is important since otherwise you gain little. The only other thing to be careful is to avoid closure-capture.

3 Likes

This alternative @generated function is (IMO) simpler, doesn’t require Val, and is x2 faster on my machine

@generated function setup_data_gen2!(data, field_names, field_values, field_types::NTuple{N}) where N
    quote
        @inline
        Base.@nexprs $N i-> setfield!(data, field_names[i], convert(field_types[i], field_values[i]))
    end
end
julia> @btime setup_data_gen2!($(Data32()), $field_names, $field_values, $field_types)       
  1.700 ns (0 allocations: 0 bytes)
1 Like

Mhm, probably I was just accidentally causing a closure capture then when I was having a bad experience with ntuple(f, ::Val). I had assumed the compiler was having some problem with the Tuple being created being too long, but it wasn’t even very long, so this sounds like a more likely explanation.

Nice. I got the same performance, though.

Edit: and I did need the Val trick to make it type stable.

You are right, had some mixed stuff in my interactive session. It does indeed allocate without the Val trick.

You still gain even without inlining if the i is constpropped, no? (EDIT: Unless you’re capturing a typevar, as discussed.) Which it’s likely to be, given that it’s a literal in the unrolled code.

Unfortunately I noticed an inference regression in 1.11.1 with the current solutions proposed.

These are the functions (recursive or generated as options):

function _fast_setfield!(atom::AtomType, field_values::FIELDS, inds_and_names::TUPTUP) where {AtomType, FIELDS, TUPTUP} 
    setfield_recursive!(atom, field_values, inds_and_names)
# Alternative with generated function:
#    N = length(inds_and_names)
#    setfield_generated!(atom, field_values, inds_and_names, Val(N))
end

import PDBTools: _parse
# Alternate values for fields that might be empty
_alt(::Type{S}) where {S<:AbstractString} = S("X")
_alt(::Type{T}) where {T} = zero(T)
# Unwrap Val-wrapped values
unwrap(::Val{T}) where {T} = T
function setfield_recursive!(atom::AtomType, field_values::FIELDS, inds_and_names::TUPTUP) where {AtomType, FIELDS, TUPTUP}
    isempty(inds_and_names) && return atom
    i, valfield = first(inds_and_names)
    field = unwrap(valfield)
    T = typeof(getfield(atom, field))
    setfield!(atom, field, _parse(T, field_values[i]; alt=_alt(T)))
    setfield_recursive!(atom, field_values, Base.tail(inds_and_names))
end

# Alternative implementation using generated functions (same peformance as far as tested)
# https://discourse.julialang.org/t/unroll-setfield/122545/22?u=lmiq
@generated function setfield_generated!(atom, field_values::FIELDS, inds_and_names::TUPTUP, ::Val{N}) where {FIELDS,TUPTUP,N}
    quote
        @inline
        Base.@nexprs $N i -> begin
            ifield, valfield = inds_and_names[i]
            field = unwrap(valfield)
            T = typeof(getfield(atom, field))
            setfield!(atom, field, _parse(T, field_values[ifield]; alt=_alt(T)))
        end
    end
end

And this is how to run the tests:

    # All this to generate the data to run the above functions.
    import Pkg; Pkg.add(url="https://github.com/m3g/PDBTools", rev="mmCIF")
    using PDBTools: Atom, _fast_setfield!
    record = "ATOM   1    N  N   . VAL A 1 1   ? 6.204   16.869  4.854   1.00 49.05 ? 1   VAL A N   1"
    inds_and_names = ((2, Val{:index_pdb}()), (4, Val{:name}()), (6, Val{:resname}()), (7, Val{:chain}()), (9, Val{:resnum}()), (11, Val{:x}()), (12, Val{:y}()), (13, Val{:z}()), (14, Val{:occup}()), (15, Val{:beta}()), (16, Val{:charge}()), (17, Val{:resnum}()), (18, Val{:resname}()), (19, Val{:chain}()), (20, Val{:name}()), (21, Val{:model}()))
    atom = Atom()
    NCOLS = 21
    field_values = NTuple{NCOLS}(eachsplit(record))
    # Finally, what matters:
    @btime _fast_setfield!($atom, $field_values, $inds_and_names)

On 1.10 I get 0 allocations, while on 1.11.1 I get 6.

Is anything obvious that I can improve in the functions to avoid that? (or would it be important to trim the example down to report a regression here?)

Thanks.

Not sure if this is the culprit for the regression, but this line should probably be replaced with T = fieldtype(AtomType, field), unless the AtomType struct has abstractly typed fields (and if it does, you have to expect some allocations here anyway as the field values must be boxed).

This is especially important if it’s possible to construct an AtomType instance without initializing all its fields, in which case getfield might error.

Turns out your allocations on 1.11 are in your _parse function when the second argument is a SubString rather than a String. They’re not related to looping/recursing over the tuple.

julia> @btime _parse($Int32, $("1"); alt=_alt($Int32));
  166.978 ns (0 allocations: 0 bytes)

julia> @btime _parse($Int32, $(@view "1"[1:1]); alt=_alt($Int32));
  186.245 ns (1 allocation: 32 bytes)

Specifically, it’s the call to findlast that apparently involves a runtime dispatch, as shown in the following allocation profile, made using ProfileCanvas.jl and the call

@profview_allocs _fast_setfield!(atom, field_values, inds_and_names) sample_rate=1.0

1 Like

It has a parametric type, but the instances are concrete. That’s why I used that.

That’s great! I was not aware of that tool. Thank you very much. I’ll work on that.

Strangely that´s not the issue. That specific call, isolated, also allocates on 1.10. But it seems to be inlined, constant propagated (or whatever) when the _fast_setfield! function is called (and the result is no allocations there).

Still, the same allocates in 1.11.1.

Iˋll keep investigating.

edit: Despite the fact that allocations are greater, in 1.11 it seems to be faster:

1.10:

@time read_mmcif("/home/leandro/Documents/perilla/all.cif")
115.154869 seconds (129.21 M allocations: 13.191 GiB, 13.05% gc time)
   Array{Atoms,1} with 64423983 atoms with fields:

1.11:

julia> @time read_mmcif("/home/leandro/Documents/perilla/all.cif")
101.243284 seconds (387.27 M allocations: 22.171 GiB, 14.08% gc time)
   Array{Atoms,1} with 64423983 atoms with fields:

So maybe this is something related to the new Memory type and how it handles allocations (I´ve experienced more than one case with allocations going up while performance improved because of that).

edit: Now I changed that to remove those allocations in 1.11 (by removing ˋfindlastˋ calls, as indicated by @danielwe), and now I it is even faster in 1.11:

@time read_mmcif("/home/leandro/Documents/perilla/all.cif")
 89.377302 seconds (129.58 M allocations: 14.491 GiB, 12.45% gc time)
   Array{Atoms,1} with 64423983 atoms with fields:

Problem solved for me (I don´t know about the inference regression there, which existed, I might com with a MWE asap). Thanks again for all the help.

1 Like