I often use the pattern:
ntuple(Val(N)) do i
@inline
# ...
end
The @inline
is important since otherwise you gain little. The only other thing to be careful is to avoid closure-capture.
I often use the pattern:
ntuple(Val(N)) do i
@inline
# ...
end
The @inline
is important since otherwise you gain little. The only other thing to be careful is to avoid closure-capture.
This alternative @generated
function is (IMO) simpler, doesnât require Val
, and is x2 faster on my machine
@generated function setup_data_gen2!(data, field_names, field_values, field_types::NTuple{N}) where N
quote
@inline
Base.@nexprs $N i-> setfield!(data, field_names[i], convert(field_types[i], field_values[i]))
end
end
julia> @btime setup_data_gen2!($(Data32()), $field_names, $field_values, $field_types)
1.700 ns (0 allocations: 0 bytes)
Mhm, probably I was just accidentally causing a closure capture then when I was having a bad experience with ntuple(f, ::Val)
. I had assumed the compiler was having some problem with the Tuple being created being too long, but it wasnât even very long, so this sounds like a more likely explanation.
Nice. I got the same performance, though.
Edit: and I did need the Val
trick to make it type stable.
You are right, had some mixed stuff in my interactive session. It does indeed allocate without the Val
trick.
You still gain even without inlining if the i
is constpropped, no? (EDIT: Unless youâre capturing a typevar, as discussed.) Which itâs likely to be, given that itâs a literal in the unrolled code.
Unfortunately I noticed an inference regression in 1.11.1 with the current solutions proposed.
These are the functions (recursive or generated as options):
function _fast_setfield!(atom::AtomType, field_values::FIELDS, inds_and_names::TUPTUP) where {AtomType, FIELDS, TUPTUP}
setfield_recursive!(atom, field_values, inds_and_names)
# Alternative with generated function:
# N = length(inds_and_names)
# setfield_generated!(atom, field_values, inds_and_names, Val(N))
end
import PDBTools: _parse
# Alternate values for fields that might be empty
_alt(::Type{S}) where {S<:AbstractString} = S("X")
_alt(::Type{T}) where {T} = zero(T)
# Unwrap Val-wrapped values
unwrap(::Val{T}) where {T} = T
function setfield_recursive!(atom::AtomType, field_values::FIELDS, inds_and_names::TUPTUP) where {AtomType, FIELDS, TUPTUP}
isempty(inds_and_names) && return atom
i, valfield = first(inds_and_names)
field = unwrap(valfield)
T = typeof(getfield(atom, field))
setfield!(atom, field, _parse(T, field_values[i]; alt=_alt(T)))
setfield_recursive!(atom, field_values, Base.tail(inds_and_names))
end
# Alternative implementation using generated functions (same peformance as far as tested)
# https://discourse.julialang.org/t/unroll-setfield/122545/22?u=lmiq
@generated function setfield_generated!(atom, field_values::FIELDS, inds_and_names::TUPTUP, ::Val{N}) where {FIELDS,TUPTUP,N}
quote
@inline
Base.@nexprs $N i -> begin
ifield, valfield = inds_and_names[i]
field = unwrap(valfield)
T = typeof(getfield(atom, field))
setfield!(atom, field, _parse(T, field_values[ifield]; alt=_alt(T)))
end
end
end
And this is how to run the tests:
# All this to generate the data to run the above functions.
import Pkg; Pkg.add(url="https://github.com/m3g/PDBTools", rev="mmCIF")
using PDBTools: Atom, _fast_setfield!
record = "ATOM 1 N N . VAL A 1 1 ? 6.204 16.869 4.854 1.00 49.05 ? 1 VAL A N 1"
inds_and_names = ((2, Val{:index_pdb}()), (4, Val{:name}()), (6, Val{:resname}()), (7, Val{:chain}()), (9, Val{:resnum}()), (11, Val{:x}()), (12, Val{:y}()), (13, Val{:z}()), (14, Val{:occup}()), (15, Val{:beta}()), (16, Val{:charge}()), (17, Val{:resnum}()), (18, Val{:resname}()), (19, Val{:chain}()), (20, Val{:name}()), (21, Val{:model}()))
atom = Atom()
NCOLS = 21
field_values = NTuple{NCOLS}(eachsplit(record))
# Finally, what matters:
@btime _fast_setfield!($atom, $field_values, $inds_and_names)
On 1.10 I get 0 allocations, while on 1.11.1 I get 6.
Is anything obvious that I can improve in the functions to avoid that? (or would it be important to trim the example down to report a regression here?)
Thanks.
Not sure if this is the culprit for the regression, but this line should probably be replaced with T = fieldtype(AtomType, field)
, unless the AtomType
struct has abstractly typed fields (and if it does, you have to expect some allocations here anyway as the field values must be boxed).
This is especially important if itâs possible to construct an AtomType
instance without initializing all its fields, in which case getfield
might error.
Turns out your allocations on 1.11 are in your _parse
function when the second argument is a SubString
rather than a String
. Theyâre not related to looping/recursing over the tuple.
julia> @btime _parse($Int32, $("1"); alt=_alt($Int32));
166.978 ns (0 allocations: 0 bytes)
julia> @btime _parse($Int32, $(@view "1"[1:1]); alt=_alt($Int32));
186.245 ns (1 allocation: 32 bytes)
Specifically, itâs the call to findlast
that apparently involves a runtime dispatch, as shown in the following allocation profile, made using ProfileCanvas.jl and the call
@profview_allocs _fast_setfield!(atom, field_values, inds_and_names) sample_rate=1.0
It has a parametric type, but the instances are concrete. Thatâs why I used that.
Thatâs great! I was not aware of that tool. Thank you very much. Iâll work on that.
Strangely that´s not the issue. That specific call, isolated, also allocates on 1.10. But it seems to be inlined, constant propagated (or whatever) when the _fast_setfield!
function is called (and the result is no allocations there).
Still, the same allocates in 1.11.1.
IËll keep investigating.
edit: Despite the fact that allocations are greater, in 1.11 it seems to be faster:
1.10:
@time read_mmcif("/home/leandro/Documents/perilla/all.cif")
115.154869 seconds (129.21 M allocations: 13.191 GiB, 13.05% gc time)
Array{Atoms,1} with 64423983 atoms with fields:
1.11:
julia> @time read_mmcif("/home/leandro/Documents/perilla/all.cif")
101.243284 seconds (387.27 M allocations: 22.171 GiB, 14.08% gc time)
Array{Atoms,1} with 64423983 atoms with fields:
So maybe this is something related to the new Memory type and how it handles allocations (I´ve experienced more than one case with allocations going up while performance improved because of that).
edit: Now I changed that to remove those allocations in 1.11 (by removing ËfindlastË calls, as indicated by @danielwe), and now I it is even faster in 1.11:
@time read_mmcif("/home/leandro/Documents/perilla/all.cif")
89.377302 seconds (129.58 M allocations: 14.491 GiB, 12.45% gc time)
Array{Atoms,1} with 64423983 atoms with fields:
Problem solved for me (I don´t know about the inference regression there, which existed, I might com with a MWE asap). Thanks again for all the help.