Allocations when using multiple dispatch, but not with specialized functions

james3 · May 29, 2023, 1:20am

Hello,

I am trying to understand why I obtain allocations when using a multiple dispatch version of a function, compared to specialized functions. In the code below, I define different two structs that contain a number of arrays. (In practice, I will have many more structs than this). I then have code that calculates the total size of all arrays contained within a struct.

using BenchmarkTools

struct x_t{T <: Real}   
    x1::Array{T, 2}
    x2::Array{T, 2}
    x3::Array{T, 1}
    x4::Array{T, 2}
    x5::Array{T, 2}
    x6::Array{T, 3}
    x7::Array{T, 2}
end

struct y_t{T <: Real}
    y1::Array{T, 2}
    y2::Array{T, 3}
end

function x_t(T)
    return x_t(Array{T, 2}(undef, 10, 5),
               Array{T, 2}(undef, 20, 5),
               Array{T, 1}(undef, 20),
               Array{T, 2}(undef, 120, 9),
               Array{T, 2}(undef, 200, 11),
               Array{T, 3}(undef, 90, 25, 9),
               Array{T, 2}(undef, 10, 15)
               )
end

function y_t(T)
    return y_t(Array{T, 2}(undef, 10, 4),
               Array{T, 3}(undef, 10, 10, 20))
end

function calcLength(xy)
    total_length = 0
    for field_name in fieldnames(typeof(xy))
        field = getfield(xy, field_name)
        if isa(field, AbstractArray)
            total_length += length(field)
        end
    end
    return total_length
end

function calcLength2(x::x_t)
    total_length = 0
    for field_name in fieldnames(typeof(x))
        field = getfield(x, field_name)
        if isa(field, AbstractArray)
            total_length += length(field)
        end
    end
    return total_length
end

function calcLength2(y::y_t)
    total_length = 0
    for field_name in fieldnames(typeof(y))
        field = getfield(y, field_name)
        if isa(field, AbstractArray)
            total_length += length(field)
        end
    end
    return total_length
end

x = x_t(Float64)
y = y_t(Float64)

println(calcLength(x))
println(calcLength(y))
println(calcLength2(x))
println(calcLength2(y))

@btime calcLength(x)
@btime calcLength(y)
@btime calcLength2(x)
@btime calcLength2(y)

The output of this (Julia 1.9.0) is

23850
2040
23850
2040
  127.265 ns (8 allocations: 464 bytes)
  43.512 ns (3 allocations: 80 bytes)
  96.447 ns (0 allocations: 0 bytes)
  42.747 ns (0 allocations: 0 bytes)

As can be seen, if I rely on multiple dispatch, the code is allocating memory, whereas if I specialize the code to the two types of structs, it does not. I would like to understand why this is, and if it can be avoided. Thanks.

SteffenPL · May 29, 2023, 3:51am

I think one challenge is that the for loop might introduce runtime dispatch. For full optimization, the compiler would have to unroll the for loop. (See the output of @code_warntype.)

An interesting observation is maybe that the timing here is different because you are dealing with global variables, if you interpolate, then you get

@btime calcLength($x)   #  83.887 ns (7 allocations: 448 bytes)
@btime calcLength($y)   #  23.266 ns (2 allocations: 64 bytes)
@btime calcLength2($x)  #  83.877 ns (7 allocations: 448 bytes)
@btime calcLength2($y)  #  23.296 ns (2 allocations: 64 bytes)

As you can see, the number of allocations is exactly the number of fields of your struct in all cases, as it requires runtime dispatch.

One approach to avoid this would be

array_length(a::AbstractArray) = length(a)
array_length(a) = 0

function calcLength3(xy)
    fields = map( f -> getproperty(xy, f), propertynames(xy))
    return sum( a ->  array_length(a), fields )    
end

@btime calcLength3($x)  # 2.424 ns (0 allocations: 0 bytes)

Here, fields would be a type with types that are completely predictable. Therefore, the sum will also be completely predictable.

(Of course, ns timings are always a bit questionable! But I think calcLength3 is one approach to tell Julia that it is really just the addition of 2 or 7 numbers.)

james3 · May 29, 2023, 10:20am

Thanks. This is very helpful. Interestingly, we obtain type instability if there are at least 32 arrays in the struct.

When there are 31 arrays in x

@btime calcLength($x) #    1.130 μs (31 allocations: 7.75 KiB)
@btime calcLength3($x) #  20.014 ns (0 allocations: 0 bytes)

And when I have 32 arrays in x

@btime calcLength($x) #   1.232 μs (32 allocations: 8.50 KiB)
@btime calcLength3($x) #  2.206 μs (37 allocations: 9.61 KiB)

Not sure if this can be avoided (or if I should just try and have fewer arrays in my struct).

gdalle · May 29, 2023, 5:57pm

I think this is because for large enough iterables it makes no sense to store fields in a tuple, and so Julia reverts back to a vector (haven’t tested it though).

Of course it is debatable whether a struct with 32 attributes makes any sense (especially if you code it manually ). But leaving that aside, the only problem is that there is a mix of vectors, matrices and higher-dimensional arrays in these attributes. If you somehow separate these categories, this will solve the type instability whenever you iterate on one category

sylvaticus · May 30, 2023, 4:42am

Hello, as a sidenote there are some terminology issues in your post.
You aren’t comparing “specialised functions” (calcLength2) vs “multiple dispatch” (calcLength).

At the opposite, calcLength2 are the functions using multiple dispatch, as you are calling the same function name with different argument types, while in calcLength you are simply using some code that doesn’t depend on the type of the argument.
Note also that while calcLength2 is using multiple dispatch, you are not benefitting of it: the whole point is to adapt the code to the specific types, so in calcLength2(::x_t) and calcLength2(::y_t) you know the types of your argument and you could sum the length of them by calling the fieldnames directly.

A few more sidenotes:

in Julia normally structs (and types) starts with a capital letter
no need to specify return in functions x_t and y_t (the two struct constructors)
you could have considered inner constructors

Topic		Replies	Views
Unexpected Allocations with Multiple Dispatch Performance	2	472	February 23, 2022
Large memory allocation in function within function New to Julia memory-allocation	7	1160	July 11, 2019
Function inside struct allocates when referenced New to Julia function	25	1047	November 7, 2023
Large memory footprint using type dispatch within a loop Performance	7	934	March 25, 2018
Speed of multiple dispatch Performance	14	1199	November 27, 2023

Allocations when using multiple dispatch, but not with specialized functions

Related topics