is unrolled into three additions. On the other hand, if I say
struct A x::Int; y::Int; z::Int end
f(a::A) = foldl((1,2,3); init = UInt(0)) do h, i
hash(getfield(a, i), h)
end
@code_typed f(A(1,2,3))
then I see a call to Base.afoldl (also with @code_llvm and @code_native), so I assume the code is not unrolled into three separate statements. (Also, if I unroll it by hand, then I get a function that runs faster.) My question therefore is: In which cases does such an expansion happen? Does it maybe depend on the size of the functions involved?
Maybe the time difference got lost when I simplified the code. Here is an example where I can see a difference:
struct A x::Int; y::Int; z::Int end
h1(a::A, h::UInt) = hash(a.z, hash(a.y, hash(a.x, hash(:A, h))))
function h2(x::A, h0::UInt)
foldl(ntuple(identity, fieldcount(typeof(x)));
init = hash(:A, h0)) do h, i
hash(getfield(x, i), h)
end
end
Cthulhu shows that the indexed is not propgate in to the kernal as const for h2. Let’s try
function h3(x::A, h0::UInt)
f(h, ::Val{i}) where {i} = hash(getfield(x, i), h)
foldl(f, ntuple(Val, fieldcount(A)); init = hash(:A, h0))
end
@btime $b .= h3.($a,UInt(0)); # 579.570 ns (0 allocations: 0 bytes)
a little better.
And the faster version is transforming a::A into a Tuple
function h4(a::A, h0::UInt)
(;x, y, z) = a
foldl((x, y, z); init = hash(:A, h0)) do h, i
hash(i, h)
end
end
@btime $b .= h4.($a,UInt(0)); # 482.653 ns (0 allocations: 0 bytes)
The downside of this is that you need to hard-code the number of fields. (Or am I wrong?) h2 and h3 work for any type, except that :A is hard-coded. I’m thinking of using the code inside a macro that can be called as @h(A) and produces the function definition. Then :A can be inserted via QuoteNode. (In a regular function like h2, I could use Symbol(typeof(x)), but this is very slow because it first creates a string.)
Of course, one can use generated functions, fancier macro code and other things to get a macro that does what I want, see this GitHUb issue. I was trying to figure out how far one can get with folding without sacrificing speed.
function h5(a::T, h0::UInt) where {T}
temp = ntuple(i -> getfield(a, i), fieldcount(A))
foldl(temp; init = hash(T, h0)) do h, i
hash(i, h)
end
end
@btime $b .= h5.($a,UInt(0)); # 482.143 ns (0 allocations: 0 bytes)
I replaced Symbol(A) with A, as every typename has a unique objectid.
But the benchmark make things more strange. Why there’s performance difference between h2 and h5