# When is a fold over a fixed-length tuple unrolled?

`@code_typed` tells me that the statement

``````foldl(+, (1,2,3); init = 0)
``````

is unrolled into three additions. On the other hand, if I say

``````struct A x::Int; y::Int; z::Int end

f(a::A) = foldl((1,2,3); init = UInt(0)) do h, i
hash(getfield(a, i), h)
end

@code_typed f(A(1,2,3))
``````

then I see a call to `Base.afoldl` (also with `@code_llvm` and `@code_native`), so I assume the code is not unrolled into three separate statements. (Also, if I unroll it by hand, then I get a function that runs faster.) My question therefore is: In which cases does such an expansion happen? Does it maybe depend on the size of the functions involved?

If you use `Cthulhu` and decend into `_afold` you’ll find the kernal is unrolled.
And the benchmark show little difference on my PC:

``````f(a::Tuple) = foldl(a; init = UInt(0)) do h, i
hash(i, h)
end
_f(a::Tuple) = hash(a, hash(a, hash(a, UInt(0))))

AA = Ref((1,2,3))
using BenchmarkTools
@btime f(\$AA[])  # 5.000 ns (0 allocations: 0 bytes)
@btime _f(\$AA[])  #  5.100 ns (0 allocations: 0 bytes)
``````
1 Like

Maybe the time difference got lost when I simplified the code. Here is an example where I can see a difference:

``````struct A x::Int; y::Int; z::Int end

h1(a::A, h::UInt) = hash(a.z, hash(a.y, hash(a.x, hash(:A, h))))

function h2(x::A, h0::UInt)
foldl(ntuple(identity, fieldcount(typeof(x)));
init = hash(:A, h0)) do h, i
hash(getfield(x, i), h)
end
end
``````

Here are the timings:

``````a = A(1,2,3)
@btime h1(\$a, UInt(0))   # 6.869 ns (0 allocations: 0 bytes)
@btime h2(\$a, UInt(0))   # 7.994 ns (0 allocations: 0 bytes)
``````

Or shouldn’t I use `\$a` for timings?

I can reproduce the Benchmark result on my pc.
But if we use a for loop to bench:

``````a = A.(rand(1:100,100),rand(1:100,100),rand(1:100,100))
b = h1.(a,UInt(0))
@btime \$b .= h1.(\$a,UInt(0)); #  664.780 ns (0 allocations: 0 bytes)
@btime \$b .= h2.(\$a,UInt(0)); #  615.517 ns (0 allocations: 0 bytes)
``````

`Cthulhu` shows that the indexed is not propgate in to the kernal as const for `h2`. Let’s try

``````function h3(x::A, h0::UInt)
f(h, ::Val{i}) where {i} = hash(getfield(x, i), h)
foldl(f, ntuple(Val, fieldcount(A)); init = hash(:A, h0))
end
@btime \$b .= h3.(\$a,UInt(0)); #  579.570 ns (0 allocations: 0 bytes)
``````

a little better.
And the faster version is transforming `a::A` into a `Tuple`

``````function h4(a::A, h0::UInt)
(;x, y, z) = a
foldl((x, y, z); init = hash(:A, h0)) do h, i
hash(i, h)
end
end
@btime \$b .= h4.(\$a,UInt(0)); #  482.653 ns (0 allocations: 0 bytes)
``````

Not sure where cause the difference.

Thanks a lot!

The downside of this is that you need to hard-code the number of fields. (Or am I wrong?) `h2` and `h3` work for any type, except that `:A` is hard-coded. I’m thinking of using the code inside a macro that can be called as `@h(A)` and produces the function definition. Then `:A` can be inserted via `QuoteNode`. (In a regular function like `h2`, I could use `Symbol(typeof(x))`, but this is very slow because it first creates a string.)
Of course, one can use generated functions, fancier macro code and other things to get a macro that does what I want, see this GitHUb issue. I was trying to figure out how far one can get with folding without sacrificing speed.

Then you might want?

``````function h5(a::T, h0::UInt) where {T}
temp = ntuple(i -> getfield(a, i), fieldcount(A))
foldl(temp; init = hash(T, h0)) do h, i
hash(i, h)
end
end
@btime \$b .= h5.(\$a,UInt(0)); #  482.143 ns (0 allocations: 0 bytes)
``````

I replaced `Symbol(A)` with `A`, as every typename has a unique objectid.
But the benchmark make things more strange. Why there’s performance difference between `h2` and `h5`

1 Like

This is as fast as the hand-coded function `h1`. Great!

EDIT: Hashing a parametric type is slower than hashing a symbol (provided that you have the symbol and don’t need to generate it via `Symbol`).

I agree that it would be good to understand why `h2` is slower than `h5` or, even better, if it were as fast!