Allocations when using multiple dispatch, but not with specialized functions

Hello,

I am trying to understand why I obtain allocations when using a multiple dispatch version of a function, compared to specialized functions. In the code below, I define different two structs that contain a number of arrays. (In practice, I will have many more structs than this). I then have code that calculates the total size of all arrays contained within a struct.

using BenchmarkTools

struct x_t{T <: Real}   
    x1::Array{T, 2}
    x2::Array{T, 2}
    x3::Array{T, 1}
    x4::Array{T, 2}
    x5::Array{T, 2}
    x6::Array{T, 3}
    x7::Array{T, 2}
end

struct y_t{T <: Real}
    y1::Array{T, 2}
    y2::Array{T, 3}
end

function x_t(T)
    return x_t(Array{T, 2}(undef, 10, 5),
               Array{T, 2}(undef, 20, 5),
               Array{T, 1}(undef, 20),
               Array{T, 2}(undef, 120, 9),
               Array{T, 2}(undef, 200, 11),
               Array{T, 3}(undef, 90, 25, 9),
               Array{T, 2}(undef, 10, 15)
               )
end

function y_t(T)
    return y_t(Array{T, 2}(undef, 10, 4),
               Array{T, 3}(undef, 10, 10, 20))
end

function calcLength(xy)
    total_length = 0
    for field_name in fieldnames(typeof(xy))
        field = getfield(xy, field_name)
        if isa(field, AbstractArray)
            total_length += length(field)
        end
    end
    return total_length
end

function calcLength2(x::x_t)
    total_length = 0
    for field_name in fieldnames(typeof(x))
        field = getfield(x, field_name)
        if isa(field, AbstractArray)
            total_length += length(field)
        end
    end
    return total_length
end

function calcLength2(y::y_t)
    total_length = 0
    for field_name in fieldnames(typeof(y))
        field = getfield(y, field_name)
        if isa(field, AbstractArray)
            total_length += length(field)
        end
    end
    return total_length
end

x = x_t(Float64)
y = y_t(Float64)

println(calcLength(x))
println(calcLength(y))
println(calcLength2(x))
println(calcLength2(y))

@btime calcLength(x)
@btime calcLength(y)
@btime calcLength2(x)
@btime calcLength2(y)

The output of this (Julia 1.9.0) is

23850
2040
23850
2040
  127.265 ns (8 allocations: 464 bytes)
  43.512 ns (3 allocations: 80 bytes)
  96.447 ns (0 allocations: 0 bytes)
  42.747 ns (0 allocations: 0 bytes)

As can be seen, if I rely on multiple dispatch, the code is allocating memory, whereas if I specialize the code to the two types of structs, it does not. I would like to understand why this is, and if it can be avoided. Thanks.

I think one challenge is that the for loop might introduce runtime dispatch. For full optimization, the compiler would have to unroll the for loop. (See the output of @code_warntype.)

An interesting observation is maybe that the timing here is different because you are dealing with global variables, if you interpolate, then you get

@btime calcLength($x)   #  83.887 ns (7 allocations: 448 bytes)
@btime calcLength($y)   #  23.266 ns (2 allocations: 64 bytes)
@btime calcLength2($x)  #  83.877 ns (7 allocations: 448 bytes)
@btime calcLength2($y)  #  23.296 ns (2 allocations: 64 bytes)

As you can see, the number of allocations is exactly the number of fields of your struct in all cases, as it requires runtime dispatch.

One approach to avoid this would be

array_length(a::AbstractArray) = length(a)
array_length(a) = 0

function calcLength3(xy)
    fields = map( f -> getproperty(xy, f), propertynames(xy))
    return sum( a ->  array_length(a), fields )    
end

@btime calcLength3($x)  # 2.424 ns (0 allocations: 0 bytes)

Here, fields would be a type with types that are completely predictable. Therefore, the sum will also be completely predictable.


(Of course, ns timings are always a bit questionable! But I think calcLength3 is one approach to tell Julia that it is really just the addition of 2 or 7 numbers.)

2 Likes

Thanks. This is very helpful. Interestingly, we obtain type instability if there are at least 32 arrays in the struct.

When there are 31 arrays in x

@btime calcLength($x) #    1.130 ÎĽs (31 allocations: 7.75 KiB)
@btime calcLength3($x) #  20.014 ns (0 allocations: 0 bytes)

And when I have 32 arrays in x

@btime calcLength($x) #   1.232 ÎĽs (32 allocations: 8.50 KiB)
@btime calcLength3($x) #  2.206 ÎĽs (37 allocations: 9.61 KiB)

Not sure if this can be avoided (or if I should just try and have fewer arrays in my struct).

I think this is because for large enough iterables it makes no sense to store fields in a tuple, and so Julia reverts back to a vector (haven’t tested it though).

Of course it is debatable whether a struct with 32 attributes makes any sense (especially if you code it manually :rofl:). But leaving that aside, the only problem is that there is a mix of vectors, matrices and higher-dimensional arrays in these attributes. If you somehow separate these categories, this will solve the type instability whenever you iterate on one category

Hello, as a sidenote there are some terminology issues in your post.
You aren’t comparing “specialised functions” (calcLength2) vs “multiple dispatch” (calcLength).

At the opposite, calcLength2 are the functions using multiple dispatch, as you are calling the same function name with different argument types, while in calcLength you are simply using some code that doesn’t depend on the type of the argument.
Note also that while calcLength2 is using multiple dispatch, you are not benefitting of it: the whole point is to adapt the code to the specific types, so in calcLength2(::x_t) and calcLength2(::y_t) you know the types of your argument and you could sum the length of them by calling the fieldnames directly.

A few more sidenotes:

  • in Julia normally structs (and types) starts with a capital letter
  • no need to specify return in functions x_t and y_t (the two struct constructors)
  • you could have considered inner constructors