# Help understanding when .= allocates

I could use help understanding the magic of allocations in Julia with the inplace `.=` operation. Below I have 4 ways to write the same operation, but allocations are different for each. Only the first doesn’t allocate. My intention is to preallocate an array and put the output of the `fn` function into that memory. Why are the 2nd, 3rd, and 4th functions allocating when the first isn’t?

``````using BenchmarkTools

N, M, L = 3, 5, 4

b = rand(N, M)
c = [rand(N) for _ in 1:L]

a = Array{Float64,3}(undef, N, M, L)

function foo1!(a, b, c)
for i in eachindex(c)
a[:, :, i] .= b .- c[i]
end
return nothing
end

function foo2!(a, b, c, fn=(-))
for i in eachindex(c)
a[:, :, i] .= fn.(b, c[i])
end
return nothing
end

function foo3!(a, b, c)
for i in eachindex(c)
for j in axes(b, 2)
a[:, j, i] .= view(b, :, j) - c[i]
end
end
return nothing
end

function foo4!(a, b, c, fn=(-))
for i in eachindex(c)
for j in axes(b, 2)
a[:, j, i] .= fn(view(b, :, j), c[i])
end
end
return nothing
end

@btime foo1!(\$a, \$b, \$c)  # 71.783 ns (0 allocations: 0 bytes)
@btime foo2!(\$a, \$b, \$c)  # 1.204 μs (12 allocations: 512 bytes)
@btime foo3!(\$a, \$b, \$c)  # 864.382 ns (20 allocations: 1.56 KiB)
@btime foo4!(\$a, \$b, \$c)  # 867.788 ns (20 allocations: 1.56 KiB)
``````

You are missing broadcast `.`s in functions 3 and 4. Fixing this alone will eliminate allocations in `foo3!`. For the other two functions, you need to add a type argument for `fn` to tell the compiler to specialize on that argument (Julia does not specialize on function arguments by default).

See the updated code below.

``````function foo2!(a, b, c, fn::F=(-)) where F
for i in eachindex(c)
a[:, :, i] .= fn.(b, c[i])
end
return nothing
end

function foo3!(a, b, c)
for i in eachindex(c)
for j in axes(b, 2)
a[:, j, i] .= view(b, :, j) .- c[i]
end
end
return nothing
end

function foo4!(a, b, c, fn::F=(-)) where F
for i in eachindex(c)
for j in axes(b, 2)
a[:, j, i] .= fn.(view(b, :, j), c[i])
end
end
return nothing
end

@btime foo1!(\$a, \$b, \$c)  # 100.447 ns (0 allocations: 0 bytes)
@btime foo2!(\$a, \$b, \$c)  # 95.607 ns (0 allocations: 0 bytes)
@btime foo3!(\$a, \$b, \$c)  # 155.782 ns (0 allocations: 0 bytes)
@btime foo4!(\$a, \$b, \$c)  # 167.668 ns (0 allocations: 0 bytes)
``````

Edit: In general, `.=` will allocate whenever something on the right hand side forces the compiler to allocate an intermediate array. In this case, besides the missing `.`s, it was the lack of type information for `fn` which presumably prevented optimization (and inlining) of the broadcast kernel `fn`.

8 Likes

Thanks! This was super helpful! It doesn’t totally solve my issue though because of another source of confusion on my part.

I was completely misunderstanding what was happening in the first example. I thought the broadcast `.-` between a matrix and a vector was broadcast across the columns of matrix `b` using vector subtraction as in `vec1 - vec2` - similar to how `.-` between a vector and a scalar works. Instead, I see it all just happens at the scalar level.

I’ll pose a different question then. Assuming `fn` returns a vector, it looks like `foo1` and `foo2` simply won’t work. For `foo4` is there any way to skip the allocation on the right hand side of `.=` and get the compiler to write directly into the `a` array?

``````function foo4!(a, b, c, fn::F=(-)) where F
for i in eachindex(c)
for j in axes(b, 2)
a[:, j, i] .= fn(view(b, :, j), c[i])
end
end
return nothing
end
``````

It looks like `fn` is just `-`, which does not return a vector. Maybe change your example so it’s clearer? Also did you mean the `a` array?

For broadcasting to work without allocations `fn` must be a function that maps scalars to scalars. If `fn` inherently maps vectors to vectors then I don’t see how you can avoid allocations.

I apologize for the confusion. I fixed the typo above and will clarify that I mean this method for subtraction.

-(A::AbstractArray, B::AbstractArray) in Base at arraymath.jl:6

I will also admit I’m not the least bit aware of the capabilities of the Julia compiler, so I have no idea if it’s capable enough to recognize the redundancy of having to create a vector and then copy it.

2 Likes

Thank you for the reference. It did help me understand what was going on with the first two cases.

Regarding the last two, I think my question then is not necessarily about vectorization. Instead, it’s if there’s a way to tell the compiler, “hey, here’s some memory, just put the output there and error if you can’t.” Originally, I naively thought that’s what `.=` did.

If this method is called, it will allocate a new array. It knows nothing about your output array and will create its own.

The way to avoid allocations is to get Julia to call `-(::Float64, ::Float64)` on each element instead, since that just produces a scalar which can be put straight into your output array with no intermediate array allocation. That’s what broadcasting does for you (well, simplified.)

Alternatively, you can write your own function, `minus!(a, b, c)`, which writes straight into `a`, but that would probably just look like this:

``````minus!(a, b, c) = (a .= b .- c)
``````

just hiding the same code one level down.

1 Like

That’s what it does - the allocations in your original example happen before `.=` comes into play though.

Assignment in julia should never allocate, not even broadcasted assignment (unless there’s a custom type that implements `copyto!` in an allocating way, which I’d consider buggy behavior).