best not to make github work too hard
I will revise the text[s] and simply post that here.
You can then copy the revised entry[s] into your PR.
Good for me.
I will already make a change right now because @vtjnash comments there.
I am very glad that our discussion lead to a doc improvement, which is covered in PR (39954 ). For help on @pure usage, please continue reading that PR, or later on the official julia Base doc, once it is released.
If anyone is interested in my original question: with a slight modification, the recursive approach for _fielddescr in V5 did the job: after wrapping the symbol by Val{s} and asserting @inbounds, compiler was able to optimize away the loop/recursion (V6). And it was surprisingly simple to solve the problem by a @generated function (V7, V8): as _fielddescr already had only type parameters in its last version, I only had to change the return value to a (constant) tuple expression. The @generated version enforces constant propagation the hard way. Benchmarks show there is no significant speed improvement in V7/V8 on the recursive version in V6. Changes are pushed to github.
The recursive solution:
@inline function _fielddescr6(::Type{PStruct{T}},::Val{s}) where {T<:NamedTuple,s} # s isa Symbol
_fielddescr6(Tuple{T.parameters[1]...}, T.parameters[2],Val(s),0)
end
@inline function _fielddescr6(::Type{syms}, ::Type{types},::Val{s},shift::Int) where {syms <: Tuple, types<:Tuple, s}
@inbounds begin
syms===Tuple{} && throw(ArgumentError(s))
type = Base.tuple_type_head(types)
if s===Base.tuple_type_head(syms)
return type, shift, bitsizeof(type)
end
_fielddescr6(Base.tuple_type_tail(syms),Base.tuple_type_tail(types),Val(s),shift+bitsizeof(type))
end
end
The @generated solution:
@generated function __fielddescr(::Type{PStruct{T}},::Val{s}) where {T<:NamedTuple,s} # s isa Symbol
shift = 0
types = T.parameters[2].parameters
syms = T.parameters[1]
idx = 1
while idx <= length(syms)
type = types[idx]
bits = bitsizeof(type)
if syms[idx]===s
return :(($type,$shift,$bits))
end
shift += bits
idx += 1
end
throw(ArgumentError(s))
end
Benchmarks (V6 to V8) show that now the same runtime speed is obtained as in handcoded constant propagation (V3 and V4), all allocations are gone.
@btime bench(sv): some work on an ordinary struct, in a loop on a Vector to get stable timings
95.776 ns (0 allocations: 0 bytes)
@btime bench(psv): same work on PStruct having same fields as struct in preceding benchmark
823.200 μs (761 allocations: 11.89 KiB)
@btime benchV2(psv): same work, but using getpropertyV2 instead of getproperty for PStruct field access
498.700 μs (1161 allocations: 24.39 KiB)
@btime benchV3(psv): same work, but handcoded getpropertyV3 replacing _fielddescr call by its result (simulated constant propagation)
144.988 ns (0 allocations: 0 bytes)
@btime benchV4(psv): same work, but handcoded getpropertyV4 with resulting SHIFT and AND operation
133.408 ns (0 allocations: 0 bytes)
@btime benchV5(psv): like V2, but recursive _fielddescr using Base.tuple_type_head and Base.tuple_type_tail in getpropertyV5
78.100 μs (421 allocations: 6.58 KiB)
@btime benchV6(psv): like V5, but symbol wrapped in Val like in V3 and V4 and @inline assertions
144.656 ns (0 allocations: 0 bytes)
@btime benchV7(psv): like V8, but symbol wrapped in Val
144.934 ns (0 allocations: 0 bytes)
@btime benchV8(psv): like V2, but _fielddescr is @generated returning a constant tuple
144.299 ns (0 allocations: 0 bytes)
benchmark figures in the solution were obtained under julia 1.6.0 RC1.
All other benchmarks were run under julia 1.5.3.
julia 1.6.0RC1 changes the picture - V6 gets as fast as V3/V4.