Meaning of copy for arrays

Is there a clear description of what “contract” copy(::AbstractArray) should implement? The docs are not very precise. We have the following examples:

julia> copy(1:6) # just returns the same immutable type
1:6

julia> copy([1,2]) # makes a copy in memory
2-element Array{Int64,1}:
 1
 2

julia> copy([1,2]') # makes a copy in memory as an Adjoint
1×2 Adjoint{Int64,Array{Int64,1}}:
 1  2

julia> copy([1 2; 3 4]') # makes a copy in memory as a Matrix
2×2 Array{Int64,2}:
 1  3
 2  4
julia> copy((1:2)') # makes a copy in memory as an Adjoint
1×2 Adjoint{Int64,Array{Int64,1}}:
 1  2

I would guess there’s the following contract:

  1. If array A is immutable (setindex! errors) then C = copy(A) need only satisfy C == A.
  2. If array A is mutable (setindex! changes the entries), then C = copy(A) needs to satisfy C == A, C must be mutable, changing C does not change any entry of A.

We also have the requirement C'b == A'b and transpose(C)*b == transpose(A)*b for a vector b, which is why the adjoints of vectors need to return an adjoint, even though adjoints of matrices do not.

If my interpretation of the contract is correct, then the last example could have returned an immutable Adjoint(1:2).

Is this a correct interpretation? I could make a PR to the doc string for copy if it is.

7 Likes

I often wondered about this too.

Also, note that “mutability” is not something that callers can inspect, and it may not be uniform for the whole arraysetindex! can error for some indices, and work for some others. This is not only a theoretical concern, cf the various packages which provide a “concatenating view”.

I would propose the following contract: if b = copy(a), then no operation on b can change the contents of a, ie if c = collect(a) then c == a will continue to hold no matter what is done to b.

1 Like

Doesn’t this hold only for deepcopy?

1 Like

The difference between copy and deepcopy is only articulated for types, not for arrays. Unfortunately copy has two different meanings, as seen in the example where the copy of an Adjoint returns a Matrix.

1 Like

Thanks, I should clarify: the invariance is for first-level array ops like setindex!, push!, pop!.

1 Like

I think no operation on b itself can change a itself, operations over their elements change the element, and the element may be present in more than one struct/array.

To me seems natural that copy is just the shallow version of deepcopy. This is, changes to things that are clearly stored in the outermost object, and not inside some inner object, do not affect the copy, however changing an inner object can affect the copy (as they both simply happen to share it).
Consequently, if you replace the value in a field or position, the copy is not affected; if you delete elements (a property of the outermost object, clearly), the copy is not affected; if you change a field of an element of the original array, well, the same element is stored in the copy, and so the change will be reflected, but it is not like the copy itself has changed.

I think talking about fields in terms of a definition of copy does not fit here, where the return type can be something completely different (with completely different fields)

In which case are you thinking exactly? The copy may be a different type, that internally has different fields, however the copy often have the same public API than the original. The fields can be seen in a less technical light and a more semantic one, this is, as the properties/information of the object itself (but not inner objects) are copied (even if the exact internal struct fields are not the same).

Yes, you’ve identified exactly the crux of the issue. Another good motivating example is views; we now have copy(view(A, I...)) defined to be A[I...] by default. In fact, I think it makes sense to try to align indexing and copy.

3 Likes

FWIW, having come back to Julia after a pause, I was surprised that copying a DataFrame seems to be a deep copy. I’d love to see the Julia community standardize the meaning of these generic functions and do more to help libraries coordinate around a consistent set of guarantees.

5 Likes

Just to be clear it should only be “one level”, that is:

julia>  A = [[1,2]]
1-element Array{Array{Int64,1},1}:
 [1, 2]

julia> copy(A)[1][1] = 4
4

julia> A
1-element Array{Array{Int64,1},1}:
 [4, 2]

So I think the entries of an array play the same role w.r.t. copy as fields do for other types.

3 Likes

The challenge in Julia is that it’s not always entirely clear what “one level” means. DataFrames is a great example. Is it a vector of vectors? Or a matrix of elements? Both? In fact, DataFrames copy is never a deepcopy, but you can see this tension (and the connection to indexing!) directly in the source:

https://github.com/JuliaData/DataFrames.jl/blob/ddba103ca71e73fa258cd1ea833e0240b3dabc11/src/dataframe/dataframe.jl#L822-L836

4 Likes