Help understanding when .= allocates

I could use help understanding the magic of allocations in Julia with the inplace .= operation. Below I have 4 ways to write the same operation, but allocations are different for each. Only the first doesn’t allocate. My intention is to preallocate an array and put the output of the fn function into that memory. Why are the 2nd, 3rd, and 4th functions allocating when the first isn’t?

using BenchmarkTools

N, M, L = 3, 5, 4

b = rand(N, M)
c = [rand(N) for _ in 1:L]

a = Array{Float64,3}(undef, N, M, L)

function foo1!(a, b, c)
    for i in eachindex(c)
            a[:, :, i] .= b .- c[i]
    end
    return nothing
end

function foo2!(a, b, c, fn=(-))
    for i in eachindex(c)
            a[:, :, i] .= fn.(b, c[i])
    end
    return nothing
end

function foo3!(a, b, c)
    for i in eachindex(c)
        for j in axes(b, 2)
            a[:, j, i] .= view(b, :, j) - c[i]
        end
    end
    return nothing
end

function foo4!(a, b, c, fn=(-))
    for i in eachindex(c)
        for j in axes(b, 2)
            a[:, j, i] .= fn(view(b, :, j), c[i])
        end
    end
    return nothing
end

@btime foo1!($a, $b, $c)  # 71.783 ns (0 allocations: 0 bytes)
@btime foo2!($a, $b, $c)  # 1.204 μs (12 allocations: 512 bytes)
@btime foo3!($a, $b, $c)  # 864.382 ns (20 allocations: 1.56 KiB)
@btime foo4!($a, $b, $c)  # 867.788 ns (20 allocations: 1.56 KiB)

You are missing broadcast .s in functions 3 and 4. Fixing this alone will eliminate allocations in foo3!. For the other two functions, you need to add a type argument for fn to tell the compiler to specialize on that argument (Julia does not specialize on function arguments by default).

See the updated code below.

function foo2!(a, b, c, fn::F=(-)) where F
    for i in eachindex(c)
            a[:, :, i] .= fn.(b, c[i])
    end
    return nothing
end

function foo3!(a, b, c)
    for i in eachindex(c)
        for j in axes(b, 2)
            a[:, j, i] .= view(b, :, j) .- c[i]
        end
    end
    return nothing
end

function foo4!(a, b, c, fn::F=(-)) where F
    for i in eachindex(c)
        for j in axes(b, 2)
            a[:, j, i] .= fn.(view(b, :, j), c[i])
        end
    end
    return nothing
end

@btime foo1!($a, $b, $c)  # 100.447 ns (0 allocations: 0 bytes)
@btime foo2!($a, $b, $c)  # 95.607 ns (0 allocations: 0 bytes)
@btime foo3!($a, $b, $c)  # 155.782 ns (0 allocations: 0 bytes)
@btime foo4!($a, $b, $c)  # 167.668 ns (0 allocations: 0 bytes)

Edit: In general, .= will allocate whenever something on the right hand side forces the compiler to allocate an intermediate array. In this case, besides the missing .s, it was the lack of type information for fn which presumably prevented optimization (and inlining) of the broadcast kernel fn.

8 Likes

Thanks! This was super helpful! It doesn’t totally solve my issue though because of another source of confusion on my part.

I was completely misunderstanding what was happening in the first example. I thought the broadcast .- between a matrix and a vector was broadcast across the columns of matrix b using vector subtraction as in vec1 - vec2 - similar to how .- between a vector and a scalar works. Instead, I see it all just happens at the scalar level.

I’ll pose a different question then. Assuming fn returns a vector, it looks like foo1 and foo2 simply won’t work. For foo4 is there any way to skip the allocation on the right hand side of .= and get the compiler to write directly into the a array?

function foo4!(a, b, c, fn::F=(-)) where F
    for i in eachindex(c)
        for j in axes(b, 2)
            a[:, j, i] .= fn(view(b, :, j), c[i])
        end
    end
    return nothing
end

It looks like fn is just -, which does not return a vector. Maybe change your example so it’s clearer? Also did you mean the a array?

For broadcasting to work without allocations fn must be a function that maps scalars to scalars. If fn inherently maps vectors to vectors then I don’t see how you can avoid allocations.

I apologize for the confusion. I fixed the typo above and will clarify that I mean this method for subtraction.

-(A::AbstractArray, B::AbstractArray) in Base at arraymath.jl:6

I will also admit I’m not the least bit aware of the capabilities of the Julia compiler, so I have no idea if it’s capable enough to recognize the redundancy of having to create a vector and then copy it.

See also Why does Julia need dots to fuse the loops?.

2 Likes

Thank you for the reference. It did help me understand what was going on with the first two cases.

Regarding the last two, I think my question then is not necessarily about vectorization. Instead, it’s if there’s a way to tell the compiler, “hey, here’s some memory, just put the output there and error if you can’t.” Originally, I naively thought that’s what .= did.

If this method is called, it will allocate a new array. It knows nothing about your output array and will create its own.

The way to avoid allocations is to get Julia to call -(::Float64, ::Float64) on each element instead, since that just produces a scalar which can be put straight into your output array with no intermediate array allocation. That’s what broadcasting does for you (well, simplified.)

Alternatively, you can write your own function, minus!(a, b, c), which writes straight into a, but that would probably just look like this:

minus!(a, b, c) = (a .= b .- c) 

just hiding the same code one level down.

1 Like

That’s what it does - the allocations in your original example happen before .= comes into play though.

Assignment in julia should never allocate, not even broadcasted assignment (unless there’s a custom type that implements copyto! in an allocating way, which I’d consider buggy behavior).