Allocation using broadcasting with custom type

gasagna · July 2, 2017, 12:05am

Hi,

I am seeing some suspect allocations using broadcasting defined on a custom type, in particular when the broadcast involves Numbers. The MWE is reported below. My actual code is somewhat larger but this example reproduce the behaviour.

struct Foo{T, A<:AbstractMatrix{T}} <: AbstractMatrix{T}
    data::A
end
Foo(data::A) where {A<:AbstractMatrix} = Foo{eltype(data), A}(data)

@inline Base.unsafe_get(f::Foo) = f.data

# Catch call to broadcast, then rebroadcast to field data
@generated function Base.Broadcast.broadcast!(f, dest::Foo, src::Vararg{Any, N}) where N
    args = [:(unsafe_get(src[$k])) for k = 1:N]
    quote
        broadcast!(f, unsafe_get(dest), $(args...))
        return dest
    end
end


# function that allocates
function bar(out, c::Number, x) 
    for i = 1:10000
        out .= x .* c
    end
    out
end

x   = Foo(randn(100, 100))
out = Foo(randn(100, 100))
c   = 1.0

@show @allocated bar(out, c, x)
@show @allocated bar(out, c, x)
@show @allocated bar(out, c, x)
@show @allocated bar(out, c, x)

The type Foo is the type I want to do broadcasting on, e.g., in the function bar. I have overloaded broadcast! on my custom type using a generated function approach. The above code results in

@allocated(bar(out, c, x)) = 3955676
@allocated(bar(out, c, x)) = 160000
@allocated(bar(out, c, x)) = 160000
@allocated(bar(out, c, x)) = 160000

If you change the line in the loop in bar to out .= x .* x, all allocations disappear. Any pointers are welcome.

Thanks

nalimilan · July 2, 2017, 8:23pm

Looks like there are two issues: slatting, and specialization on the function (by default specialization only happens if function is called). This gets rid of allocations:

function Base.Broadcast.broadcast!{F}(f::F, dest::Foo, src1, src2)
    broadcast!(f, unsafe_get(dest), unsafe_get(src1), unsafe_get(src2))
    return dest
end

I’m not sure whether there’s a way of avoiding allocations and still use splatting. However, note that the allocation is only 16 bytes per iteration, so if the array is large this may not matter in practice (no copy of the array is made).

BTW, this thread might be useful.

gasagna · July 2, 2017, 8:51pm

Thanks! I was getting crazy at understanding why this happens. Note that it only does it with custom types and it does not allocate when arrays are used in the dot notation (bug? can be fixed?)

The allocation is small, but annoying.

Follow-up question: I need to generate many version of this function for different number of arguments.

This code does what I need, I am reporting it here in case someone will ever face a similar issue.

for nargs = 1:10
    args  = [Symbol("src", i) for i = 1:nargs]
    calls = [:(unsafe_get($(args[i]))) for i = 1:nargs]
    @eval @generated function broadcast_c!(f, ::Type{FTField}, ::Type{FTField}, dest, $(args...))
            :(broadcast!(f, unsafe_get(dest), $($calls...)))
          end    
end

nalimilan · July 3, 2017, 8:06am

A possible explanation is that the code for Array uses @inline annotations for functions with varargs, and functions which are not inlined take a tuple of arrays rather than varargs. You could take inspiration from it.

andyferris · July 3, 2017, 10:00am

I might be wrong, but it’s possible that these allocations will go away when non-isbits structs are stored unboxed (I’m guessing this is an non-inlining varargs call that passes a tuple, and the tuple needs to be constructed on the heap…). I.e. Hopefully it’s just a (known) compiler improvement away.

gasagna · July 3, 2017, 11:21am

Do you mean in the broadcast code? Could you provide an example in Base?

nalimilan · July 3, 2017, 12:58pm

I meant the method you get with e.g. @less broadcast!(*, [1], 1, [1]), and the methods which are called from there.

gasagna · July 3, 2017, 1:56pm

Ok. But adding inline annotations do not seem to be the cure.

Topic		Replies	Views
Unknown allocation Performance	5	574	February 10, 2020
Custom structs in JuliaDiffEq: In-place broadcasting results in allocation and slowdown General Usage performance	7	673	December 14, 2021
Strange allocations during broadcasting Performance	13	714	March 25, 2020
Broadcasting Heisen-allocations Performance question , bug	2	257	July 13, 2023
Broadcast into higher dimensional array vs nested array Performance	4	625	July 3, 2020

Allocation using broadcasting with custom type

Related topics