Can I avoid allocations when broadcasting over slices?

I was a little surprised to find that broadcasting over slices allocates, even when using view. Here’s a MWE.

u = randn(10, 100)
ids = reshape(1:100, 10, 10)
out = similar(u[ids])

function foo1!(out, u, ids)
    @. out = u[ids] 
    # FastBroadcast.@.. does slightly better, but 
    # still allocates as much as `view(u, ids)`
function foo2!(out, u, ids)
    out .= view(u, ids)
function foo3!(out, u, ids)
    for (i, id) in enumerate(ids)
        out[i] = u[id]

# run once for compilation 
foo1!(out, u, ids); foo2!(out, u, ids); foo3!(out, u, ids);

# timings
@time foo1!(out, u, ids);
@time foo2!(out, u, ids);
@time foo3!(out, u, ids);

I see this for both 1.9.2 and 1.10-beta on Mac ARM.

Is there a way to avoid allocations without resorting to for loops?

You’re doing linear indexing into a matrix, which is implicitly doing a reshape under the hood when you take the view. That’s your allocation — it’s in the view, not the broadcast.

julia> @time foo2!(out, u, ids);
  0.000012 seconds (2 allocations: 80 bytes)

julia> v = reshape(u, :);

julia> @time foo2!(out, v, ids);
  0.000009 seconds

One option here is to broadcast the indexing itself:

# use `Ref(u)` so that `u` is not broadcasted
out .= getindex.(Ref(u), ids)

That’s helpful to know! Thanks for clarifying what’s happening internally.