# How exactly is deepcopy defined?

Observe the following:

``````julia> a = [8,7,6]
3-element Array{Int64,1}:
8
7
6
julia> b = [a,a]
2-element Array{Array{Int64,1},1}:
[8, 7, 6]
[8, 7, 6]
julia> c=deepcopy(b)
2-element Array{Array{Int64,1},1}:
[8, 7, 6]
[8, 7, 6]
julia> c[1][1] = 9
9
julia> c
2-element Array{Array{Int64,1},1}:
[9, 7, 6]
[9, 7, 6]
``````

So the point of this example is: The output `c` of `deepcopy` still contains two arrays that are actually the same object. This seems to contradict the documentation, which says that the copying continues recursively until the leaves are reached. If the above example is the intended behavior of `deepcopy`, could someone give a precise definition of its functionality?

Whether the documentation is confusing depends on what you think â€ścopyâ€ť means; in this case `deepcopy` deliberately goes to some effort to do what you observed. In the source code the `ObjectIdDict` is precisely to make sure that references are preserved as references; the source is not very long and reasonably straightforward, so if you have specific questions Iâ€™d urge you to check it first. If you think the docstring could be further clarified, a PR (or at least an edited text posted here) would be great!

It would be weird if the deepcopied object didnâ€™t behave as the original one. Iâ€™m not sure that could even be called a copy.

2 Likes

On the docs:

``````"""
deepcopy(x)
Create a deep copy of `x`: everything is copied recursively, resulting in a fully
independent object. For example, deep-copying an array produces a new array whose elements
are deep copies of the original elements.
"""
``````
• Defining the deepcopy function with â€śCreate a deep copyâ€¦â€ť seems bad (Is there a proper def in the docs?)
• On â€śEverything is copied recâ€¦â€ť, the world everything doesnâ€™t tell me anything.

Maybe someone can help with my proposal (I hope mine is at least a start):

``````"""
Instantiate a variable of the same type of x and proceed recursively through the fields of x.
The leaves values are copied to new instances and all the references inside the output variable
point to these new instances.
"""
``````

I think that this is pretty clear: â€śeverythingâ€ť in this context is â€śall elements in a containerâ€ť.

Regarding your proposed docstring: not all containers have fields, eg consider arrays.

Perhaps the documentation could be clearer, but the behaviour makes sense (and has precedent in other languages). Without it, deep-copying structures with cycles (eg. doubly-linked lists) would never complete. And an array of 100 views into another array X would create 100 copies of X.

1 Like

Right, I forgot the main part.

Is the word â€ścontainerâ€ť defined in the docs? I am not able to understand your explanation of everything.
Maybe it depends on the leaf type itself. The implementation of deepcopy_internal of the leaftype is called and thatâ€™s where â€śeverythingâ€ť is define.

Wait, all types have a number of fields, sometimes this number is zero. Additionally elements of arrays are copied because of the `deepcopy_internal method` for `Array`s (Is there something also for `AbstractArray`s?)

There could be a statement that suggest that for â€ścontainersâ€ť (to be defined) there is a common practice of propagating through the elements the recursive action.

Thanks for the correction

For reference, i found also the definition of deepcopy in `DataStructures.jl` for sorted containers.

``````"""
deepcopy(sc)
This returns a copy of sc in which the data is deep-copied, i.e., the keys and values are replicated if they are mutable types. A semitoken for the original sc is a valid semitoken for the copy because this operation preserves the relative positions of the data in memory. Time O(maxn), where maxn denotes the maximum size that sc has attained in the past.
"""
``````

``````"""
deepcopy(x)
Create a deep copy of `x`: everything is copied recursively, where every node is visited only once,
resulting in a fully independent object of the same internal structure as the input object. For example,
deep-copying an array produces a new array whose elements are deep copies of the original elements,
and deep-copying an array of views of the same data array only copies the data array once.
"""
``````

How about just add the OPâ€™s code as an example with an enlightening explanation. A good example is better than thousand words

2 Likes

``````"""This is roughly equivalent to serializing, and then unserializing an object.
Deepcopy respects object identity (===).
(python analog: deepcopy = lambda x: pickle.loads(pickle.dumps(x));
there can be no generic C/C++ analog) """
``````

Maybe one should also add something about failure modes for types that are bad for deepcopying because they hold external references (e.g. pointers, os resources, filehandles)? Maybe one should describe how finalizers and object references in closures of finalizers are handled?

I cannot say, since I honestly donâ€™t know how deepcopy handles these cases.

You can always read the source:

``````julia> io = open("/tmp/test", "w")
IOStream(<file /tmp/test>)

julia> @edit deepcopy(io)
``````

Examples/concepts from another language may not be a good idea: it is better to explain concepts than rely on the readerâ€™s knowledge of Python.