When should I define my own `deepcopy` and `copy` functions for types, if ever?

Just wanted to understand if I should define deepcopy and copy for my own types, if at all.

Looking at the code suggests to me deepcopy is automatic, and therefore there is generally no need to define it for user defined types.

I wonder if perhaps this changes if the user defined type manages some resource, such as memory, a file handle or network socket, or if it interfaces with external C code which manages such a resource?

2 Likes

The way the deepcopy system is made, there is one entry function to the machinery, and you definitely should not touch / extend it:

function deepcopy(@nospecialize x)
    isbitstype(typeof(x)) && return x
    return deepcopy_internal(x, IdDict())::typeof(x)
end

Instead, you need to extend Base.deepcopy_internal. A helpful starting point is

julia> methods(Base.deepcopy_internal)

And yes, you absolutely must take care about external resources.

Unfortunately, Base is not very consistent with doing that right, e.g.

julia> f=Base.open("/tmp/foo", "w")
IOStream(<file /tmp/foo>)

julia> ff=deepcopy(f)
IOStream(<file /tmp/foo>)

julia> ff === f
false

julia> close(f)

julia> write(ff, "foo")
3

julia> clocorrupted double-linked list

[3013] signal 6 (-6): Aborted

(yes, the julia runtime exploded during tab-completion after typing clo)

I guess this is a question for the docs or github or the core committers here:

What is the appropriate behavior on deepcopy with externally managed resources that cannot be copied?

  1. Throw an exception in deepcopy_internal, so above behavior would be a bug
  2. deepcopy shouldn’t throw, so abort(), nuke the process from orbit
  3. Can’t meaningfully deepcopy the thing, just return itself (so f === ff should return true)
  4. Deepcopy of an ioStream makes no sense. This is a user error. Don’t overcomplicate your code with error-checking, later crash / memory corruption is totally fine, this is not a language/runtime bug.

PS. This is not the official stance of the community, but the deepcopy system is known buggy enough that I would recommend never using it, i.e. I am resigned to (4)

2 Likes

In other languages, it is possible to deepcopy (aka clone, or copy construct) objects such as streams which wrap files.

However, typically it would be up to the user (at least in many cases) to define what this means.

It makes sense to move a handle to a file, it does not make a huge amount of sense to clone one.

Memory however is quite different. Generally speaking cloning a pointer to some allocated memory is fine. Just allocate the same quantity of memory and copy the data.

How this may work in Julia as it relates to copy and deepcopy - I do not know. I can only share my experience with other languages.

My advice - generally, don’t touch (or use, for that matter - see here) deepcopy.

For copy, I’d generally only define it when there’s some actual reason to do so to ensure you have seperate (shallow) instances. It’s used relatively rarely, as there is no such thing as a copy constructor like C++ has. Any “copy” you’d create would (generally) have to go through a constructor provided by the struct in the first place.

1 Like

No, not really. The problem is that “deepcopy / serialize this arbitrary object” fundamentally cannot work reliably. It’s just a bad API in view of objects that represent foreign objects.

Like, what are you supposed to do if the object represents an open pipe to a sub-process?

There are various ways languages deal with that:

  1. Objects can override/customize serialization/deepcopy behavior
  2. In some languages, deepcopy fails-by-default. Say java Cloneable / Serializable interfaces.
  3. In other languages like julia, the obvious default implementation applies unless otherwise specified.

Option (2) leads to many objects being unnecessarily un-copyable. Option (3) leads to the occasional crash when something un-copyable is copied.

The end result is that generic language-supplied serialization and deep-copy of arbitrary objects is a bad idea. Don’t do it in production code, view it as a debug feature.

Deep-copy or serialization of non-arbitrary objects makes perfect sense, though!

However, the person who calls deepcopy is responsible for knowing what types go in there and reviewing the code of the deepcopy_internal methods that get called.

3 Likes

Please read my comment directly below this where I said

However, typically it would be up to the user (at least in many cases) to define what this means.

Why be excessively pedantic, deliberately divert the thread on a tangent, especially when all you have done is agree with me.

What would you do if you needed to make a copy of, say for the sake of example, a Dict?

The typical reason for taking copies is to de-alias, i.e. take snapshots to preserve contents in view of subsequent mutations. E.g.

julia> d=Dict(3=>4);dc = deepcopy(d); d[2]=5; d,dc
(Dict(2 => 5, 3 => 4), Dict(3 => 4))

Sorry, I didn’t want to be needlessly pedantic. But no, copying of fundamentally un-copyable objects doesn’t really work in other languages either – you might get better results than a runtime crash, though (e.g. a compiler error, an exception, or the identical object back).

edit: Also, I think I did answer your initial question without too much diversion?

  1. Yes, you must handle external resources
  2. No, you don’t handle them by extending deepcopy, you need to extend deepcopy_internal
  3. For the “how to extend for my custom type”, there are good examples to read in methods(Base.deepcopy_internal).
  4. If you don’t want to deal with deepcopy, then there’s lots of precedent for simply silently corrupting the runtime if a user of your type makes the mistake of deepcopying your type that doesn’t support it. That ain’t pretty, but it is not too out-of-line with expectations.

Read e.g.

julia> @less Base.deepcopy_internal(BigInt(1), IdDict())

and look at the called functions. What you see is that there is some wiring that needs to be done: BigInt wants to be allocated + handled by GMP/MPZ, so we can’t just malloc/memcpy. We also need to set up a finalizer for the copy.

This cannot be done generically by the language / runtime! Somebody had to sit down, look at the API of GMP, and then do the right thing.

3 Likes

It does work, provided you write sensible logic.

It is not good design in many cases. That does not mean it is impossible.

For example, in C++, if I want to clone a class which manages a file handle, I can do that if I want. There’s a number of things I could do. I wouldn’t necessarily advise actually doing them.

  • Make one copy read only
  • Invalidate the object being cloned from
  • Create a new file handle to a new file with the same contents as the old file and a sequentially increasing filename with a number appended

None of these are good design ideas.

To go back to the actual point I was making.

It makes sense to move a handle to a file, it does not make a huge amount of sense to clone one.

Memory however is quite different. Generally speaking cloning a pointer to some allocated memory is fine. Just allocate the same quantity of memory and copy the data.

How this may work in Julia as it relates to copy and deepcopy - I do not know. I can only share my experience with other languages.

Pointless tangent.

It’s worth noting that this behavior does not require deepcopy; copy alone is sufficient. It actually gets at the heart of the distinction: dicts are a mapping between keys and values and copy allows you to change that mapping without affecting the original.

Deepcopy would allow you to additionally make copies of the keys and values themselves.

As others have said, you almost always want to define and use copy, and not touch deepcopy. It’s just unfortunate that the latter is defined by default for all types while the former is not… but defining copy correctly requires you to understand the meaning of the type and how it uses its fields. Doing anything without understanding that is precisely how deepcopy goes awry.

1 Like

As a moderator, I’ll step in here and implore folks here to not gripe about others staying on topic. We can split topics if things go too far astray; you can flag messages if you think a moderator’s intervention would be helpful.

Griping about off-topicness is itself off-topic and only serves to escalate matters and — ironically — it keeps that secondary topic alive and likely to continue.

Let’s keep this thread focused on deepcopy and copy, and not about meta commentary on what’s off-topic or not.

10 Likes