Hi folks,
today I found a strange behavior that eludes my understanding. Would someone please tell me where the allocations are coming from in this example? I am probably overlooking something but I have been staring at this snippet for far longer now than Iβd like to admit and cannot figure it out.
My goal is to compute multinomial coefficients. Consider these functions:
# computes a binomial coefficient as Float64
function mybinomial(N::T, k::T) where T
(k > N || k < 0) && return 0.0
2k > N && (k = N-k)
(isinteger(N) && isinteger(k)) || error("N and k need to be integers! Got: ($N, $k)")
ret = 1.0
for i in 1:Int(k)
ret *= (N+1-i)/i
end
return ret
end
# compute the multinomial coefficient iteratively
multinomial() = 1.0
function multinomial(a, vals...)
s = a
ret = 1.0
for v in vals
s += v
ret *= mybinomial(s,v)
end
return ret
end
Strangely, I see an allocation:
julia> using BenchmarkTools
julia> @btime multinomial($1,$2,$3,$4,$3)
26.745 ns (1 allocation: 16 bytes)
3.6036e6
I cannot understand where this comes from:
- I checked
@code_warntype multinomial(1,2,3,4,3)
and everything is inferred perfectly - I checked
@code_llvm multinomial(1,2,3,4,3)
and@code_native multinomial(1,2,3,4,3)
which shows that the loop has been unrolled and I cannot find anything that looks an allocation to me - Interestingly, the allocation seems to be very sensitive to seemingly random things⦠E.g. replacing
mybinomial
withbinomial
from Base removes the allocation (but is overall slower and will overflow in my application). I also had a slightly different version of my function that has 2 allocations that vanish when changing tomybinomial
tobinomial
. - I somewhat suspect that these allocations might have to do with iteration over the varargs. Maybe
mybinomial
has some different compiler effects thanbinomial
which causes the compiler to not ellide some allocation?
Variant with 2 allocations
julia> function multinomial3(a, vals...)
s = sum(vals)
ret = binomial(s+a,a)
for v in vals
ret *= mybinomial(s,v) # 0 allocs with binomial
s -=v
end
return ret
end
julia> @btime multinomial3($1,$2,$3,$4,$3)
121.435 ns (2 allocations: 64 bytes)
For reference, I now use a generated function to generate functions like this:
julia> function multinomial_gen(a,b,c,d,e)
ret = mybinomial(a+b,b)
ret *= mybinomial(a+b+c,c)
ret *= mybinomial(a+b+c+d,d)
ret *= mybinomial(a+b+c+d+e,e)
return ret
end
julia> @btime multinomial_gen($1,$2,$3,$4,$3)
14.487 ns (0 allocations: 0 bytes)
3.6036e6
which is quite a bit faster and has never given me any allocations (I actually need to compute A LOT of these so that performance increase actually was very noticeable )
All the results shown above where on Julia 1.11.1 but on 1.10.5 I also see the same behavior. If someone can elucidate this for me Iβd be very grateful