Memory allocation with `view` and array assignment

Hey,

I was wondering about this code:

julia> x = randn((2048, 2048));

julia> z = copy(x);

julia> using BenchmarkTools

julia> @btime $z[2:2000, 2:2000] = view($x, 2:2000, 2:2000);
  5.386 ms (0 allocations: 0 bytes)

julia> @btime $z[2:2000, 2:2000] .= view($x, 2:2000, 2:2000);
  3.195 ms (0 allocations: 0 bytes)

julia> @btime $z[2:2000, 2:2000] = view($z, 2:2000, 2:2000); # why allocations?
  9.731 ms (2 allocations: 30.49 MiB)

julia> @btime $z[2:2000, 2:2000] .= view($z, 2:2000, 2:2000); # why allocations?
  6.859 ms (2 allocations: 30.49 MiB)

I would have expected zero allocations for all of the examples, but why are there allocations if I assign the array to itself? Is it due to the reason that without a copy, the following

julia> @btime $z[2:2000, 2:2000] .= view($z, 1:1999, 1:1999);
  6.871 ms (2 allocations: 30.49 MiB)

julia> @btime $z[1:1999, 1:1999] .= view($z, 2:2000, 2:2000);
  6.872 ms (2 allocations: 30.49 MiB)

could fail because of overwriting?

Thanks for help!

Felix

Itā€™s due to the risk of aliasing. See ?Base.unalias for more details.

This actually looks like a bug. We have the smarts to see that the indices are matched between the LHS and the RHS (and thus is memory safe). Iā€™m not sure why this simple case isnā€™t getting the fast-path, but a more complicated expression avoids the conservative copy:

julia> @btime $z[2:2000,2:2000] .= view($z, 2:2000, 2:2000);
  12.624 ms (2 allocations: 30.49 MiB)

julia> @btime $z[2:2000,2:2000] .= 0 .+ view($z, 2:2000, 2:2000);
  2.654 ms (0 allocations: 0 bytes)

julia> @btime $z[2:2000,2:2000] .= 0 .+ view($z, 1:1999, 1:1999);
  13.290 ms (2 allocations: 30.49 MiB)
5 Likes

If thereā€™s a check that notices the source and destination arrays are equal, but so are the indices, couldnā€™t we make a fastest-path (no-op)?

Although custom array types would have the problem of the possiblity of specialized getindex and setindex! methods.

2 Likes

if an object is such that

obj[a] = obj[a] is not conceptually no-opā€¦ I think itā€™s doomed either way

I mean sure, thereā€™s no limit to how far down you can go on the optimizationsā€¦ but every one you make has a cost in the surface area that generic behaviors need to accommodate ā€” and Iā€™d say this is of fairly low utility. Iā€™m guessing it was an MWE reduced down from something like x .= f.(x) (which is similarly missing the aliasing fast-path, perhaps because it has another ā€œfast-pathā€ itā€™s using).

Maybe some hacky use of MappedArrays where the inverse is only an approximation / not exact?
Maybe someone could implement an algorithm that iteratively converges.

(Obviously log(exp(x)) isnā€™t exact either, but I think people wouldnā€™t mind that turning into a no-op.)

I agree. The more checks like this that exist that canā€™t be skipped, the more likely I am to re-implement things when micro-optimizing better reflecting the problem at hand.

Here the costs would be in (a) the check itself and (b) forwarding the result of just returning to the function calling the check. So if one does add an extra check for if indices match, then it seems like additionally adding a ā€œfastest-pathā€ hopefully shouldnā€™t require much more code (unless forwarding the results is especially costly).

I think in general, no one should be doing a .= a if it doesnā€™t do anything.
But a .= b is common, and someone may have code that often uses internal caches, and is called both internally by their library, as well as by users. Then because in that particular case they might think exact aliasing is likely, itā€™d be easy enough to optimize manually with a === b || (a .= b).

3 Likes

Maybe we could perform a runtime memory dependency test. Itā€™s much more general than just handling this trivial case. It can also handle partially overlapping case and avoid copying. memmove does exactly this, though it only deals with one dimensional arrayā€¦

Now I encountered a real case where I didnā€™t expect allocation.

Isnā€™t that something we should fix?

julia> function f(x,y)
           x .= y.^2 .+ x.^2
       end
f (generic function with 1 method)

julia> function g(x,y)
           x .= view(y, 1:length(y)).^2 .+ view(x, 1:length(x)).^2
       end
g (generic function with 1 method)

julia> x, y = rand(Int, 2048*2048), rand(Int, 2048*2048);

julia> @time f(x,y);
  0.030055 seconds (15.08 k allocations: 717.336 KiB, 83.78% compilation time)

julia> @time f(x,y);
  0.005222 seconds

julia> @time g(x,y);
  0.076428 seconds (233.42 k allocations: 44.631 MiB, 77.89% compilation time)

julia> @time g(x,y);
  0.026629 seconds (2 allocations: 32.000 MiB)
Performance seems to be significantly different as well:
julia> x, y = rand(Int, 5000*10_000), rand(Int, 5000*10_000);

julia> @btime f($x, $y);
  53.750 ms (0 allocations: 0 bytes)

julia> @btime g($x, $y);
  209.211 ms (2 allocations: 381.47 MiB)

julia> x, y = rand(Int, 10*10), rand(Int, 10*10);

julia> @btime f($x, $y);
  49.186 ns (0 allocations: 0 bytes)

julia> @btime g($x, $y);
  114.352 ns (1 allocation: 896 bytes)

That looks like an anti-aliasing false positive.

1 Like

Well, @btime $z[2:2000, 2:2000] .= view($z, 2:2000, 2:2000); fallback to copyto!(a::AbstractArray, b::AbstractArray), where we always do unalias and performs no a === b check.

The copyto! doesnā€™t fix that, does it?

This example seems unrelated with copyto!(A::AbstractArray, B::AbstractArray), as it fallback to copyto!(::AbstractArray, ::Broacasted{Nothing}). Since we only use a === b to avoid copy during self-inplace broadcast, youā€™d better avoid this pattern (ā€˜gā€™) during usage.