I wonder if perhaps this changes if the user defined type manages some resource, such as memory, a file handle or network socket, or if it interfaces with external C code which manages such a resource?
The way the deepcopy system is made, there is one entry function to the machinery, and you definitely should not touch / extend it:
function deepcopy(@nospecialize x)
isbitstype(typeof(x)) && return x
return deepcopy_internal(x, IdDict())::typeof(x)
end
Instead, you need to extend Base.deepcopy_internal. A helpful starting point is
julia> methods(Base.deepcopy_internal)
And yes, you absolutely must take care about external resources.
Unfortunately, Base is not very consistent with doing that right, e.g.
julia> f=Base.open("/tmp/foo", "w")
IOStream(<file /tmp/foo>)
julia> ff=deepcopy(f)
IOStream(<file /tmp/foo>)
julia> ff === f
false
julia> close(f)
julia> write(ff, "foo")
3
julia> clocorrupted double-linked list
[3013] signal 6 (-6): Aborted
(yes, the julia runtime exploded during tab-completion after typing clo)
I guess this is a question for the docs or github or the core committers here:
What is the appropriate behavior on deepcopy with externally managed resources that cannot be copied?
Throw an exception in deepcopy_internal, so above behavior would be a bug
deepcopy shouldn’t throw, so abort(), nuke the process from orbit
Can’t meaningfully deepcopy the thing, just return itself (so f === ff should return true)
Deepcopy of an ioStream makes no sense. This is a user error. Don’t overcomplicate your code with error-checking, later crash / memory corruption is totally fine, this is not a language/runtime bug.
PS. This is not the official stance of the community, but the deepcopy system is known buggy enough that I would recommend never using it, i.e. I am resigned to (4)
In other languages, it is possible to deepcopy (aka clone, or copy construct) objects such as streams which wrap files.
However, typically it would be up to the user (at least in many cases) to define what this means.
It makes sense to move a handle to a file, it does not make a huge amount of sense to clone one.
Memory however is quite different. Generally speaking cloning a pointer to some allocated memory is fine. Just allocate the same quantity of memory and copy the data.
How this may work in Julia as it relates to copy and deepcopy - I do not know. I can only share my experience with other languages.
My advice - generally, don’t touch (or use, for that matter - see here) deepcopy.
For copy, I’d generally only define it when there’s some actual reason to do so to ensure you have seperate (shallow) instances. It’s used relatively rarely, as there is no such thing as a copy constructor like C++ has. Any “copy” you’d create would (generally) have to go through a constructor provided by the struct in the first place.
No, not really. The problem is that “deepcopy / serialize this arbitrary object” fundamentally cannot work reliably. It’s just a bad API in view of objects that represent foreign objects.
Like, what are you supposed to do if the object represents an open pipe to a sub-process?
There are various ways languages deal with that:
Objects can override/customize serialization/deepcopy behavior
In some languages, deepcopy fails-by-default. Say java Cloneable / Serializable interfaces.
In other languages like julia, the obvious default implementation applies unless otherwise specified.
Option (2) leads to many objects being unnecessarily un-copyable. Option (3) leads to the occasional crash when something un-copyable is copied.
The end result is that generic language-supplied serialization and deep-copy of arbitrary objects is a bad idea. Don’t do it in production code, view it as a debug feature.
Deep-copy or serialization of non-arbitrary objects makes perfect sense, though!
However, the person who calls deepcopy is responsible for knowing what types go in there and reviewing the code of the deepcopy_internal methods that get called.
Sorry, I didn’t want to be needlessly pedantic. But no, copying of fundamentally un-copyable objects doesn’t really work in other languages either – you might get better results than a runtime crash, though (e.g. a compiler error, an exception, or the identical object back).
edit: Also, I think I did answer your initial question without too much diversion?
Yes, you must handle external resources
No, you don’t handle them by extending deepcopy, you need to extend deepcopy_internal
For the “how to extend for my custom type”, there are good examples to read in methods(Base.deepcopy_internal).
If you don’t want to deal with deepcopy, then there’s lots of precedent for simply silently corrupting the runtime if a user of your type makes the mistake of deepcopying your type that doesn’t support it. That ain’t pretty, but it is not too out-of-line with expectations.
and look at the called functions. What you see is that there is some wiring that needs to be done: BigInt wants to be allocated + handled by GMP/MPZ, so we can’t just malloc/memcpy. We also need to set up a finalizer for the copy.
This cannot be done generically by the language / runtime! Somebody had to sit down, look at the API of GMP, and then do the right thing.
It is not good design in many cases. That does not mean it is impossible.
For example, in C++, if I want to clone a class which manages a file handle, I can do that if I want. There’s a number of things I could do. I wouldn’t necessarily advise actually doing them.
Make one copy read only
Invalidate the object being cloned from
Create a new file handle to a new file with the same contents as the old file and a sequentially increasing filename with a number appended
None of these are good design ideas.
To go back to the actual point I was making.
It makes sense to move a handle to a file, it does not make a huge amount of sense to clone one.
Memory however is quite different. Generally speaking cloning a pointer to some allocated memory is fine. Just allocate the same quantity of memory and copy the data.
How this may work in Julia as it relates to copy and deepcopy - I do not know. I can only share my experience with other languages.
It’s worth noting that this behavior does not require deepcopy; copy alone is sufficient. It actually gets at the heart of the distinction: dicts are a mapping between keys and values and copy allows you to change that mapping without affecting the original.
Deepcopy would allow you to additionally make copies of the keys and values themselves.
As others have said, you almost always want to define and use copy, and not touch deepcopy. It’s just unfortunate that the latter is defined by default for all types while the former is not… but defining copy correctly requires you to understand the meaning of the type and how it uses its fields. Doing anything without understanding that is precisely how deepcopy goes awry.
As a moderator, I’ll step in here and implore folks here to not gripe about others staying on topic. We can split topics if things go too far astray; you can flag messages if you think a moderator’s intervention would be helpful.
Griping about off-topicness is itself off-topic and only serves to escalate matters and — ironically — it keeps that secondary topic alive and likely to continue.
Let’s keep this thread focused on deepcopy and copy, and not about meta commentary on what’s off-topic or not.