“Double check for calls to generic functions.” is cited from @pure doc here. Maybe it-s a misunderstanding of mine: I interpreted it as “double check any use of generic functions in @pure annoted code, and be really really sure that their use is pure, i. e. does not alter global state and always returns the same for the same parameters”.
What exactly is “use of a generic function”, and how strict is @pure in julia? Consider
@pure f() = 1+1
Do you agree this is an incorrect use of @pure? julia doc states:“Every function in Julia is a generic function”. Base.:(+) is a julia function.
Or is the use correct, because “1” is guaranteed to be an expression of type Int, and “1+1” compiles to a call of Base.:(+)(x::Int, y::Int) which is known to be a pure function?
In the linked post, a copy loop is to be optimized. For all keys of a NamedTuple, a property is copied. Optimized result is still a sequence of copy operations. In my case, I need to optimize away the whole loop (or recursion).
I adopted the tip of Tim Holy “You could try doing the recursion in the type domain, Tuple{:a, :b, :c}
using Base.tuple_type_head
and Base.tuple_type_tail
.” to my case.
_fielddescr then becomes:
@inline function _fielddescr5(::Type{PStruct{T}},s::Symbol) where T <: NamedTuple
_fielddescr5(Tuple{T.parameters[1]...}, T.parameters[2],s,0)
end
@inline function _fielddescr5(::Type{syms}, ::Type{types},s::Symbol,shift::Int) where {syms <: Tuple, types<:Tuple}
syms===Tuple{} && throw(ArgumentError(s))
type = Base.tuple_type_head(types)
if s===Base.tuple_type_head(syms)
return type, shift, bitsizeof(type)
end
_fielddescr5(Base.tuple_type_tail(syms),Base.tuple_type_tail(types),s,shift+bitsizeof(type))
end
@inline function getpropertyV5(x::PStruct{T},s::Symbol) where T<:NamedTuple
type,shift,bits = _fielddescr5(PStruct{T},s)
return _convert(type,_get(reinterpret(UInt64,x),shift,bits))
end
Benchmark results show recursion variant is quite good, but still about factor 500 slower than full constant propagation. I also tried a variant V6 with symbol parameter wrapped in Val and @inline assertions, it did not matter.
@btime bench(sv): some work on an ordinary struct, in a loop on a Vector to get stable timings
80.143 ns (0 allocations: 0 bytes)
@btime bench(psv): same work on PStruct having same fields as struct in preceding benchmark
2.487 ms (408 allocations: 6.38 KiB)
@btime benchV2(psv): same work, but using getpropertyV2 instead of getproperty for PStruct field access
428.500 μs (808 allocations: 18.88 KiB)
@btime benchV3(psv): same work, but handcoded getpropertyV3 replacing _fielddescr call by its result (simulated constant propagation)
116.484 ns (0 allocations: 0 bytes)
@btime benchV4(psv): same work, but handcoded getpropertyV4 with resulting SHIFT and AND operation
113.245 ns (0 allocations: 0 bytes)
@btime benchV5(psv): like V2, but recursive _fielddescr using Base.tuple_type_head and Base.tuple_type_tail in getpropertyV5
69.999 μs (96 allocations: 1.50 KiB)
@btime benchV6(psv): like V5, but symol wrapped in Val like in V3 and V4 and @inline assertions
69.999 μs (96 allocations: 1.50 KiB)
Code including these variants is now in Github.