Push! to reshape-d vector unshares data with parent array

Minimal example below. Pushing to a reshaped vector y seems to allocate a copy of the original matrix x so that they no longer share data, but it’s strange because it doesn’t seem to reassign y, which would print a warning because y is const. Is an Array instance implemented to be able to switch the address of the underlying data sequence?

julia> blah = [1.0]; @time push!(blah, 2.0); # compile push! first
  0.004331 seconds (6.02 k allocations: 361.375 KiB, 98.84% compilation time)

julia> @time push!(blah, 3.0); # push! seems compiled
  0.000006 seconds

julia> begin
       const x = zeros(1,2)
       const y = reshape(x, 2)
       end;

julia> x[2] = 2.0; x[2] == y[2] # x and y share data
true

julia> @time push!(y, 7.0); # odd, there's 1 allocation
  0.000002 seconds (1 allocation: 80 bytes)

julia> x[1] = 1.0; x[1] == y[1] # x and y no longer share data
false

julia> @time push!(y, 8.0); # further push! does not allocate
  0.000002 seconds

EDIT: btw, you cannot push to a original vector a if you had reshaped it to a matrix b instead. This restriction makes sense to me but I’m not sure how mere Arrays are able to restrict each other like this, there’s no parent field. The error-throwing _growend! method just ccalls jl_array_grow_end and I can’t read C.

julia> begin
       const a = zeros(2)
       const b = reshape(a, 1, 2)
       end;

julia> a[2] = 2.0; a[2] == b[2] # a and b share data
true

julia> push!(a, 7.0)
ERROR: cannot resize array with shared data
Stacktrace:
 [1] _growend!
   @ ./array.jl:948 [inlined]
 [2] push!(a::Vector{Float64}, item::Float64)
   @ Base ./array.jl:995

julia> typeof.((x, y, a, b)) # sanity check, they are all Arrays
(Matrix{Float64}, Vector{Float64}, Vector{Float64}, Matrix{Float64})

Unless I misunderstand you, this isn’t that odd. push! needs to reallocate a larger buffer every now and then, which is bigger than the current array size. So new allocations only happen after the current buffer size is exhausted.

But I don’t know how this works under the hood.

1 Like

y is const, the contents of y are not const.

const x = [1, 2]
const y = [3, 4]
x == y    # false
x = y     # Warning redefinition of constant x. ...
x .= y    # works, assigns x[i] = y[i] 
x == y    # true
1 Like

I do remember copying occurs when a larger size is needed, thanks for reminding me. I suppose the unexpected part of this is the unsharing; if there were a copying, I would expect all the arrays to migrate to share the data. The docstring for reshape does say “The two arrays share the same underlying data, so that the result is mutable if and only if A is mutable, and setting elements of one alters the values of the other.” The copying in this case doesn’t seem motivated by size constraints, the array was too small and the example still works if starting with 2 elements; it really does seem like the copying was intended to unshare.

Why would you expect them to stay linked? The original array has to keep its shape after all, so the only way to resolve the conflict is to copy data.

By necessity, yes. An Array is a wrapper around a block of memory, internally stored as a pointer (plus some fields on the C side, to store e.g. the length/shape of the array). When a Vector grows, object identity has to be preserved, while potentially having to move the data around due to a potential realloc being necessary.

2 Likes

Because that’s what our docs say:

help?> reshape
search: reshape promote_shape

  reshape(A, dims...) -> AbstractArray
  reshape(A, dims) -> AbstractArray

  Return an array with the same data as A, but with different dimension sizes or number of dimensions. The two arrays
  share the same underlying data, so that the result is mutable if and only if A is mutable, and setting elements of
  one alters the values of the other.

This is also pretty inconsistent:

julia> x = [1, 2]; y = reshape(x, 1, 2); push!(x, 3);
ERROR: cannot resize array with shared data

julia> x = [1 2]; y = reshape(x, 2); push!(y, 3);

julia> x
1×2 Matrix{Int64}:
 1  2

julia> y
3-element Vector{Int64}:
 1
 2
 3
4 Likes

By this point in the thread it seems clear that an Array instance can change their underlying buffer, the question is whether it should do that for a reshape vector to unshare data with a original matrix (1st code example in OP), when the same is disallowed for an original vector sharing data with a reshape matrix (2nd code example in OP).

I also searched array.c for the text “cannot resize array with shared data” in jl_array_grow_end, and while I still can’t read C I’m satisfied knowing that there is metadata somewhere keeping track of arrays sharing data (though I’ll appreciate an explanation of what the syntax a->flags.how = 3 means).

Docs only talks about “setting elements” which means an assignment operation if I understand things correctly.

(unless you mean that docs do not mention that by resizing the return value you unshare it - indeed it could be improved and this information could be added)

My understanding of this design decision is as follows:

  • x is informed (via flags.how) that y uses the same data as x owns, so x is not allowed to be resized, as it could corrupt y which reuses xs data.
  • However, reallocating y does not affect x. If you allocate a new memory for y you are sure that it will not corrupt x.

(of course the design also could be that y would be disallowed to be reallocated, but I assume that the developer of this functionality did not want to introduce such a restriction as it was not needed)

2 Likes

This does seem like an intentional design choice because reshapeing Arrays does not use the wrapper ReshapedArray. However, consider these counterexamples to the two points of your understanding:

  • The shared data error is not thrown if you made views or forced a ReshapedArray, even with the much more disruptive insert!. The flags.how protection might be limited to Arrays.
julia> x = zeros(2); y = reshape(x, 1, 2)
1×2 Matrix{Float64}:
 0.0  0.0

julia> push!(x, 7.0)
ERROR: cannot resize array with shared data

julia> x = zeros(2); y = view(x, 1:1)
1-element view(::Vector{Float64}, 1:1) with eltype Float64:
 0.0

julia> push!(x, 7.0); insert!(x, 1, 1.0); y # y is corrupted!
1-element view(::Vector{Float64}, 1:1) with eltype Float64:
 1.0

julia> x = zeros(2); y = Base.ReshapedArray(x, (1,2), ())
1×2 reshape(::Vector{Float64}, 1, 2) with eltype Float64:
 0.0  0.0

julia> push!(x, 7.0); insert!(x, 1, 1.0); y # y is corrupted!
1×2 reshape(::Vector{Float64}, 1, 2) with eltype Float64:
 1.0  0.0
  • When you reshape an Array to the same shape, you just reuse the instance, the ultimate sharing. Of course, push! does not reallocate. It could be a hassle to check whether reshape made a separate instance or not to keep the variables’ behavior consistent during mutations.
julia> x = zeros(2); y = reshape(x, 2);

julia> x === y
true

julia> push!(y, 7.0); x === y # x seems corrupted if you didn't check
true

yes, because if you make a view parent is not notified that a view is created.

Indeed I have forgotten to mention this case: https://github.com/JuliaLang/julia/blob/master/base/reshapedarray.jl#L48 (I have checked it, but somehow it went out of my mind). In the case you mention the ccall does not even happen indeed.
This also should probably be documented.