# Allocations when using multiple dispatch, but not with specialized functions

Hello,

I am trying to understand why I obtain allocations when using a multiple dispatch version of a function, compared to specialized functions. In the code below, I define different two structs that contain a number of arrays. (In practice, I will have many more structs than this). I then have code that calculates the total size of all arrays contained within a struct.

``````using BenchmarkTools

struct x_t{T <: Real}
x1::Array{T, 2}
x2::Array{T, 2}
x3::Array{T, 1}
x4::Array{T, 2}
x5::Array{T, 2}
x6::Array{T, 3}
x7::Array{T, 2}
end

struct y_t{T <: Real}
y1::Array{T, 2}
y2::Array{T, 3}
end

function x_t(T)
return x_t(Array{T, 2}(undef, 10, 5),
Array{T, 2}(undef, 20, 5),
Array{T, 1}(undef, 20),
Array{T, 2}(undef, 120, 9),
Array{T, 2}(undef, 200, 11),
Array{T, 3}(undef, 90, 25, 9),
Array{T, 2}(undef, 10, 15)
)
end

function y_t(T)
return y_t(Array{T, 2}(undef, 10, 4),
Array{T, 3}(undef, 10, 10, 20))
end

function calcLength(xy)
total_length = 0
for field_name in fieldnames(typeof(xy))
field = getfield(xy, field_name)
if isa(field, AbstractArray)
total_length += length(field)
end
end
end

function calcLength2(x::x_t)
total_length = 0
for field_name in fieldnames(typeof(x))
field = getfield(x, field_name)
if isa(field, AbstractArray)
total_length += length(field)
end
end
end

function calcLength2(y::y_t)
total_length = 0
for field_name in fieldnames(typeof(y))
field = getfield(y, field_name)
if isa(field, AbstractArray)
total_length += length(field)
end
end
end

x = x_t(Float64)
y = y_t(Float64)

println(calcLength(x))
println(calcLength(y))
println(calcLength2(x))
println(calcLength2(y))

@btime calcLength(x)
@btime calcLength(y)
@btime calcLength2(x)
@btime calcLength2(y)
``````

The output of this (Julia 1.9.0) is

``````23850
2040
23850
2040
127.265 ns (8 allocations: 464 bytes)
43.512 ns (3 allocations: 80 bytes)
96.447 ns (0 allocations: 0 bytes)
42.747 ns (0 allocations: 0 bytes)
``````

As can be seen, if I rely on multiple dispatch, the code is allocating memory, whereas if I specialize the code to the two types of structs, it does not. I would like to understand why this is, and if it can be avoided. Thanks.

I think one challenge is that the `for` loop might introduce runtime dispatch. For full optimization, the compiler would have to unroll the `for` loop. (See the output of `@code_warntype`.)

An interesting observation is maybe that the timing here is different because you are dealing with global variables, if you interpolate, then you get

``````@btime calcLength(\$x)   #  83.887 ns (7 allocations: 448 bytes)
@btime calcLength(\$y)   #  23.266 ns (2 allocations: 64 bytes)
@btime calcLength2(\$x)  #  83.877 ns (7 allocations: 448 bytes)
@btime calcLength2(\$y)  #  23.296 ns (2 allocations: 64 bytes)
``````

As you can see, the number of allocations is exactly the number of fields of your struct in all cases, as it requires runtime dispatch.

One approach to avoid this would be

``````array_length(a::AbstractArray) = length(a)
array_length(a) = 0

function calcLength3(xy)
fields = map( f -> getproperty(xy, f), propertynames(xy))
return sum( a ->  array_length(a), fields )
end

@btime calcLength3(\$x)  # 2.424 ns (0 allocations: 0 bytes)
``````

Here, `fields` would be a type with types that are completely predictable. Therefore, the `sum` will also be completely predictable.

(Of course, `ns` timings are always a bit questionable! But I think `calcLength3` is one approach to tell Julia that it is really just the addition of 2 or 7 numbers.)

2 Likes

Thanks. This is very helpful. Interestingly, we obtain type instability if there are at least 32 arrays in the struct.

When there are 31 arrays in x

``````@btime calcLength(\$x) #    1.130 μs (31 allocations: 7.75 KiB)
@btime calcLength3(\$x) #  20.014 ns (0 allocations: 0 bytes)
``````

And when I have 32 arrays in x

``````@btime calcLength(\$x) #   1.232 μs (32 allocations: 8.50 KiB)
@btime calcLength3(\$x) #  2.206 μs (37 allocations: 9.61 KiB)
``````

Not sure if this can be avoided (or if I should just try and have fewer arrays in my struct).

I think this is because for large enough iterables it makes no sense to store `fields` in a tuple, and so Julia reverts back to a vector (haven’t tested it though).

Of course it is debatable whether a struct with 32 attributes makes any sense (especially if you code it manually ). But leaving that aside, the only problem is that there is a mix of vectors, matrices and higher-dimensional arrays in these attributes. If you somehow separate these categories, this will solve the type instability whenever you iterate on one category

Hello, as a sidenote there are some terminology issues in your post.
You aren’t comparing “specialised functions” (`calcLength2`) vs “multiple dispatch” (`calcLength`).

At the opposite, `calcLength2` are the functions using multiple dispatch, as you are calling the same function name with different argument types, while in `calcLength` you are simply using some code that doesn’t depend on the type of the argument.
Note also that while `calcLength2` is using multiple dispatch, you are not benefitting of it: the whole point is to adapt the code to the specific types, so in `calcLength2(::x_t)` and `calcLength2(::y_t)` you know the types of your argument and you could sum the length of them by calling the fieldnames directly.

A few more sidenotes:

• in Julia normally structs (and types) starts with a capital letter
• no need to specify `return` in functions `x_t` and `y_t` (the two struct constructors)
• you could have considered inner constructors