How exactly is deepcopy defined?

Observe the following:

julia> a = [8,7,6]
3-element Array{Int64,1}:
 8
 7
 6
julia> b = [a,a]
2-element Array{Array{Int64,1},1}:
 [8, 7, 6]
 [8, 7, 6]
julia> c=deepcopy(b)
2-element Array{Array{Int64,1},1}:
 [8, 7, 6]
 [8, 7, 6]
julia> c[1][1] = 9
9
julia> c
2-element Array{Array{Int64,1},1}:
 [9, 7, 6]
 [9, 7, 6]

So the point of this example is: The output c of deepcopy still contains two arrays that are actually the same object. This seems to contradict the documentation, which says that the copying continues recursively until the leaves are reached. If the above example is the intended behavior of deepcopy, could someone give a precise definition of its functionality?

1 Like

Whether the documentation is confusing depends on what you think “copy” means; in this case deepcopy deliberately goes to some effort to do what you observed. In the source code the ObjectIdDict is precisely to make sure that references are preserved as references; the source is not very long and reasonably straightforward, so if you have specific questions I’d urge you to check it first. If you think the docstring could be further clarified, a PR (or at least an edited text posted here) would be great!

It would be weird if the deepcopied object didn’t behave as the original one. I’m not sure that could even be called a copy.

2 Likes

On the docs:

"""
    deepcopy(x)
Create a deep copy of `x`: everything is copied recursively, resulting in a fully
independent object. For example, deep-copying an array produces a new array whose elements
are deep copies of the original elements.
"""
  • Defining the deepcopy function with “Create a deep copy…” seems bad (Is there a proper def in the docs?)
  • On “Everything is copied rec…”, the world everything doesn’t tell me anything.

Maybe someone can help with my proposal (I hope mine is at least a start):

"""
Instantiate a variable of the same type of x and proceed recursively through the fields of x. 
The leaves values are copied to new instances and all the references inside the output variable
point to these new instances.
"""

I think that this is pretty clear: “everything” in this context is “all elements in a container”.

Regarding your proposed docstring: not all containers have fields, eg consider arrays.

Perhaps the documentation could be clearer, but the behaviour makes sense (and has precedent in other languages). Without it, deep-copying structures with cycles (eg. doubly-linked lists) would never complete. And an array of 100 views into another array X would create 100 copies of X.

2 Likes

Right, I forgot the main part.

Is the word “container” defined in the docs? I am not able to understand your explanation of everything.
Maybe it depends on the leaf type itself. The implementation of deepcopy_internal of the leaftype is called and that’s where “everything” is define.

Wait, all types have a number of fields, sometimes this number is zero. Additionally elements of arrays are copied because of the deepcopy_internal method for Arrays (Is there something also for AbstractArrays?)

There could be a statement that suggest that for “containers” (to be defined) there is a common practice of propagating through the elements the recursive action.

Thanks for the correction

For reference, i found also the definition of deepcopy in DataStructures.jl for sorted containers.

"""
deepcopy(sc)
This returns a copy of sc in which the data is deep-copied, i.e., the keys and values are replicated if they are mutable types. A semitoken for the original sc is a valid semitoken for the copy because this operation preserves the relative positions of the data in memory. Time O(maxn), where maxn denotes the maximum size that sc has attained in the past.
"""

How about

"""
    deepcopy(x)
Create a deep copy of `x`: everything is copied recursively, where every node is visited only once, 
resulting in a fully independent object of the same internal structure as the input object. For example, 
deep-copying an array produces a new array whose elements are deep copies of the original elements, 
and deep-copying an array of views of the same data array only copies the data array once.
"""

How about just add the OP’s code as an example with an enlightening explanation. A good example is better than thousand words :wink:

3 Likes

How about adding to the description

"""This is roughly equivalent to serializing, and then unserializing an object. 
Deepcopy respects object identity (===).
 (python analog: deepcopy = lambda x: pickle.loads(pickle.dumps(x));
there can be no generic C/C++ analog) """

Maybe one should also add something about failure modes for types that are bad for deepcopying because they hold external references (e.g. pointers, os resources, filehandles)? Maybe one should describe how finalizers and object references in closures of finalizers are handled?

I cannot say, since I honestly don’t know how deepcopy handles these cases.

You can always read the source:

julia> io = open("/tmp/test", "w")
IOStream(<file /tmp/test>)

julia> @edit deepcopy(io)

Examples/concepts from another language may not be a good idea: it is better to explain concepts than rely on the reader’s knowledge of Python.

There’s a relatively simple explanation to why this is the case, but it hasn’t been explicitly stated.

Your object a is a vector containing 3 integers.

b is a vector containing two references to a. That’s the same reference to a, twice.

deepcopy produces an exact clone of an object (presumably) by recursively calling deepcopy on each sub-element of an object.

In this case, you deepcopy(b), which produces a vector containing the same data a b does, which is two references to a.

If deepcopy were to change references into copies of the underlying object, that would be surprising and hard for the programmer to understand.

In this case I think your confusion has come from the way in which Julia has printed data to your terminal window. It looks like b is a 2d matrix, but it is not.