Meaning of copy for arrays

dlfivefifty · July 21, 2020, 8:36am

Is there a clear description of what “contract” copy(::AbstractArray) should implement? The docs are not very precise. We have the following examples:

julia> copy(1:6) # just returns the same immutable type
1:6

julia> copy([1,2]) # makes a copy in memory
2-element Array{Int64,1}:
 1
 2

julia> copy([1,2]') # makes a copy in memory as an Adjoint
1×2 Adjoint{Int64,Array{Int64,1}}:
 1  2

julia> copy([1 2; 3 4]') # makes a copy in memory as a Matrix
2×2 Array{Int64,2}:
 1  3
 2  4
julia> copy((1:2)') # makes a copy in memory as an Adjoint
1×2 Adjoint{Int64,Array{Int64,1}}:
 1  2

I would guess there’s the following contract:

If array A is immutable (setindex! errors) then C = copy(A) need only satisfy C == A.
If array A is mutable (setindex! changes the entries), then C = copy(A) needs to satisfy C == A, C must be mutable, changing C does not change any entry of A.

We also have the requirement C'b == A'b and transpose(C)*b == transpose(A)*b for a vector b, which is why the adjoints of vectors need to return an adjoint, even though adjoints of matrices do not.

If my interpretation of the contract is correct, then the last example could have returned an immutable Adjoint(1:2).

Is this a correct interpretation? I could make a PR to the doc string for copy if it is.

Tamas_Papp · July 21, 2020, 9:53am

I often wondered about this too.

Also, note that “mutability” is not something that callers can inspect, and it may not be uniform for the whole array — setindex! can error for some indices, and work for some others. This is not only a theoretical concern, cf the various packages which provide a “concatenating view”.

I would propose the following contract: if b = copy(a), then no operation on b can change the contents of a, ie if c = collect(a) then c == a will continue to hold no matter what is done to b.

oheil · July 21, 2020, 9:59am

Doesn’t this hold only for deepcopy?

dlfivefifty · July 21, 2020, 10:01am

The difference between copy and deepcopy is only articulated for types, not for arrays. Unfortunately copy has two different meanings, as seen in the example where the copy of an Adjoint returns a Matrix.

Tamas_Papp · July 21, 2020, 10:19am

Thanks, I should clarify: the invariance is for first-level array ops like setindex!, push!, pop!.

Henrique_Becker · July 21, 2020, 1:19pm

I think no operation on b itself can change a itself, operations over their elements change the element, and the element may be present in more than one struct/array.

To me seems natural that copy is just the shallow version of deepcopy. This is, changes to things that are clearly stored in the outermost object, and not inside some inner object, do not affect the copy, however changing an inner object can affect the copy (as they both simply happen to share it).
Consequently, if you replace the value in a field or position, the copy is not affected; if you delete elements (a property of the outermost object, clearly), the copy is not affected; if you change a field of an element of the original array, well, the same element is stored in the copy, and so the change will be reflected, but it is not like the copy itself has changed.

dlfivefifty · July 21, 2020, 1:23pm

I think talking about fields in terms of a definition of copy does not fit here, where the return type can be something completely different (with completely different fields)

Henrique_Becker · July 21, 2020, 1:31pm

In which case are you thinking exactly? The copy may be a different type, that internally has different fields, however the copy often have the same public API than the original. The fields can be seen in a less technical light and a more semantic one, this is, as the properties/information of the object itself (but not inner objects) are copied (even if the exact internal struct fields are not the same).

mbauman · July 21, 2020, 2:36pm

Yes, you’ve identified exactly the crux of the issue. Another good motivating example is views; we now have copy(view(A, I...)) defined to be A[I...] by default. In fact, I think it makes sense to try to align indexing and copy.

johnmyleswhite · July 21, 2020, 3:01pm

FWIW, having come back to Julia after a pause, I was surprised that copying a DataFrame seems to be a deep copy. I’d love to see the Julia community standardize the meaning of these generic functions and do more to help libraries coordinate around a consistent set of guarantees.

dlfivefifty · July 21, 2020, 3:28pm

Just to be clear it should only be “one level”, that is:

julia>  A = [[1,2]]
1-element Array{Array{Int64,1},1}:
 [1, 2]

julia> copy(A)[1][1] = 4
4

julia> A
1-element Array{Array{Int64,1},1}:
 [4, 2]

So I think the entries of an array play the same role w.r.t. copy as fields do for other types.

mbauman · July 21, 2020, 5:15pm

The challenge in Julia is that it’s not always entirely clear what “one level” means. DataFrames is a great example. Is it a vector of vectors? Or a matrix of elements? Both? In fact, DataFrames copy is never a deepcopy, but you can see this tension (and the connection to indexing!) directly in the source:

https://github.com/JuliaData/DataFrames.jl/blob/ddba103ca71e73fa258cd1ea833e0240b3dabc11/src/dataframe/dataframe.jl#L822-L836

Topic		Replies	Views
What is the philosophy behind arrays not being copied when assigned to another variable? New to Julia	24	2387	May 19, 2023
Why making a change in a copy of a variable changes the original variable? New to Julia	16	4303	November 14, 2020
Copying of arrays and array elements New to Julia	4	1229	September 18, 2019
How exactly is deepcopy defined? General Usage deepcopy	12	1986	December 4, 2024
Where exactly memory copies happen in Julia code? General Usage question , performance	6	1674	August 27, 2017

Meaning of copy for arrays

Related topics