# Allocations when overwriting array elements in-place

I have a relatively large matrix, where I want to set all values below a certain threshold to zero:

``````@time A = rand(1000,1000,1000)
@time bool = A .< 0.5
@time A[bool] .= 0.0

5.653554 seconds (2 allocations: 7.451 GiB, 2.64% gc time)
1.698547 seconds (6 allocations: 119.214 MiB, 20.14% gc time)
5.429685 seconds (5 allocations: 3.725 GiB, 1.63% gc time)
``````

I would have expected the final step to allocate essentially no memory since I am overwriting elements of an existing array in-place, however the amount of actually allocated memory equals that of creating an entirely new array containing the overwritten values (Julia 1.7.2).

Interestingly, when I instead use a view, the allocations occur during the creation of the view rather than the overwriting step.

``````@time Av = @view A[bool]
@time Av .= 0.0

2.786945 seconds (5 allocations: 3.725 GiB, 9.23% gc time)
1.292666 seconds
``````

Could anyone help me understand why these allocations occur and whether there is a way to avoid them?

Those are the allocations of the broadcast mechanism. Put it in a function and those should go away.

It doesn’t look like it unfortunately - when in a function, the same amount of memory allocated in the final step:

``````function func()
@time A = rand(1000,1000,1000)
@time bool = A .< 0.5
@time A[bool] .= 0.0
end

func()

3.824082 seconds (2 allocations: 7.451 GiB, 2.76% gc time)
2.128477 seconds (4 allocations: 119.213 MiB, 18.11% gc time)
4.128135 seconds (4 allocations: 3.725 GiB, 0.22% gc time)

``````

I’m pretty sure this has to do with how the view of `A` is created from `bool`:

``````julia> A = rand(100, 100, 100);

julia> bool = A .< 0.5;

julia> using BenchmarkTools

julia> @btime @view(\$A[\$bool]);
428.214 μs (4 allocations: 3.81 MiB)
``````

(Note that I reduced all the dimensions by a factor of 10 to speed up testing).

Even using views here doesn’t help because creating the view of `A` ends up computing the indices of all the nonzero elements of `bool`, which is where the allocation comes from:

``````julia> C = @view A[bool];

julia> C.indices
([4, 5, 6, 9, 10, 11, 12, 14, 16, 17  …  999986, 999988, 999989, 999990, 999991, 999992, 999993, 999996, 999998, 1000000],)
``````

If you’re primarily interested in removing the allocations, then that’s really easy: Just write a loop:

``````julia> function set_using_mask!(A, bool)
@assert axes(A) == axes(bool)
@inbounds for i in eachindex(A)
if bool[i]
A[i] = 0
end
end
end
set_using_mask! (generic function with 1 method)

julia> @btime (\$A[\$bool] .= 0);
1.089 ms (4 allocations: 3.81 MiB)

3.799 ms (0 allocations: 0 bytes)
``````

I was surprised to find that the loop is actually slower than the allocating broadcast. Clearly there’s some interesting benefit to the way `A[bool] .= 0` computes all the nonzero indices first, then iterates over them. Indeed, we can replicate some of that effect:

``````julia> function set_using_mask_allocating!(A, bool)
@assert axes(A) == axes(bool)
indices = findall(bool)
@inbounds for i in indices
A[i] = 0
end
end
set_using_mask_allocating! (generic function with 1 method)

2.454 ms (2 allocations: 11.44 MiB)
``````

I’m definitely surprised that this is faster.

3 Likes

I think `A[bool]` creates a large temporary matrix：

``````julia> @time map!(x -> x < 0.5 ? 0.0 : x, A, A);
0.840729 seconds (14.05 k allocations: 823.300 KiB, 0.98% compilation time)
``````

Looks like a job for `replace!`:

``````function foo!(A)
bool = A .< 0.5
A[bool] .= 0.0
return A
end

bar!(A) = replace!(x->(x<0.5 ? zero(x) : x), A);
``````

Benchmark:

``````1.7.2> @btime foo!(A) setup=(A=rand(100,100,100)) evals=1;
2.003 ms (8 allocations: 3.92 MiB)

1.7.2> @btime bar!(A) setup=(A=rand(100,100,100)) evals=1;
276.800 μs (0 allocations: 0 bytes)
``````
8 Likes