How is `deepcopy` so clever regarding aliasing and self-reference?

I find this behaviour very fortunate, but also black magic:

a = [5]
u = [a, a, a]
v = deepcopy(u)
v[1][1] = 8
v #  [[8], [8], [8]]  # My naive 'deepcopy' would have produced [[8], [5], [5]] instead.

I expected deepcopy to be naive and produce three independent copies of a within v . But it seems much smarter and somehow figures that all three values in u are aliases of each other and then it reproduces the same aliasing relations within v . This is very fortunate, but how does it do so?

The doc says:

Calling deepcopy on an object should generally have the same effect as serializing and then deserializing it.

But again, my naive u -> serialize -> file -> deserialize -> v roundtrip would have produced three independent values within v, and the result would have been [[8], [5], [5]] again.

Is it specified which kind of “serialization” is clever enough for specifying deepcopy?

2 Likes

I looked at it sometime ago, if I remember correctly the object is explored in a recursive fashion and an IdDict is used to keep track of which objects have already been seen so that to be able to preserve object identities.

Indeed I learnt it by incorrectly assuming that it worked as you thought as well :smiley: : see Possible performance improvements for `deepcopy`?

1 Like

This IdDict trick is also how Functors.jl works, to handle elements appearing more than once. For your serialisation, you could consider doing something like this, to keep only one copy of a before proceeding:

julia> a = [5];

julia> u = [a, a, a];

julia> fmapstructure(identity, u, prune=nothing)
3-element Vector{Union{Nothing, Vector{Int64}}}:
 [5]
 nothing
 nothing
1 Like

Serialization.serialize is! It’s worth noting this is how Julia communicates between workers, so it’s quite valuable to have this behavior.

julia> a = [5];

julia> u = [a, a, a];

julia> using Serialization

julia> serialize("test.jls", u)

julia> v = deserialize("test.jls")
3-element Vector{Vector{Int64}}:
 [5]
 [5]
 [5]

julia> v[1][1]=8
8

julia> v
3-element Vector{Vector{Int64}}:
 [8]
 [8]
 [8]
3 Likes

Thank you @Tortar @mcabbott @mbauman!

IIUC some underlying data model like the following is what makes both deepcopy and serialize clever:

u:
  sub_values: (≈ "IdDict")
     <1>: 5
     <2>: [<1>]
  type: Vector{...}
  value: [<2>, <2>, <2>]

Black magic is gone, and only bliss remains ^ ^

1 Like