I have an abstract type, call it Abe, subtypes of which may or may not be mutable. I want a function that I can call on Abes that will make a (deepish) copy of the ones that are mutable (or in some cases wrap objects that are mutable) and do nothing for the immutable ones. The goal is to be able to do things like do_thing!!(copy(a::Abe)) when I want to make sure a isn’t modified by some operation do_thing!! that might make use of the mutability of some of the subtypes.
Which copy function should I use for this? According to Jeff, deepcopy is dangerous and you should avoid it like a semi-serious contagious disease. However, also according to Jeff, one shouldn’t define methods of copy for immutable types. The docstring also explicitly says copy is shallow, and I would like to violate that for some of my types that e.g. wrap an Array. However, defining my very own mycopy function seems quite silly, when really what I want to implement is just the natural meaning of copying for my types. What do?
I’m not sure that this is the most idiomatic (though I never fully understood the harm of a no-op copy for inapplicable types, so I can’t say that I’m avoiding the true issue with my following suggestion), but my workaround has been to define my own maybecopy function that will call copy for applicable types but not others. It’s not that silly. You can support this with a trait.
# Set the default to true or false, as convenient.
# An advantage of defaulting to `true` is that it will try to copy a type and
# if it throws a MethodError you can add it to the list of exceptions.
cancopy(::Any) = true
# list of exceptions
cancopy(::MyUncopyableType) = false
# copy or don't, depending on `cancopy`
maybecopy(x) = cancopy(x) ? copy(x) : x
Getting less idiomatic: instead of the trait you could use hasmethod or applicable, but I recall a few types that define copy methods just to throw errors which then still require special-casing. Further, introspection of method tables is highly discouraged in most situations.
As for deepcopy, it is allegedly dangerous and is appallingly slow. I think most other schemes you could concoct will ultimately be better than it.
I recently defined a copy method for an XML.jl Node. I thought I was doing the right thing to avoid deepcopy.
Nodes in XML.jl are immutable but they may contain some mutable elements, one of which may be a set of nested vectors of child Nodes. To copy a Node, it is also necessary to copy all it’s children. So my copy is not shallow.
I am working with XML to manipulate Excel .xlsx files. I have a library of Nodes in a Dict to provide a set of common Excel features and these each includes nested child Nodes. To add one to an xlsx file, I copy the Node from the library and then insert it into the internal XML structure of the xlsx file.
I can call the function I use to make a duplicate of the Node in the library anything I want, but functionally, I am making a copy. Is this to be frowned upon? And if so, what should I do instead? Just rename the function?
I interpreted that thread as that copy is fine for immutable types with mutation API (and don’t necessarily output the same type), and those without mutation API don’t have a reason to fall back to an identity-like result because you can’t mutate them to begin with. Since copy and deepcopy really only show up in code that does mutation, an implementation would only delay an error. There were exceptions listed across the links, so we can actually make an example:
julia> setindex!(copy([1,2,3]), 10, 1) # mutable type, copies to new array
3-element Vector{Int64}:
10
2
3
julia> setindex!(copy(view([1,2,3], 1:2:3)), 10, 1) # immutable type with mutation API, copies to Array
2-element Vector{Int64}:
10
3
julia> setindex!(copy(1:3), 10, 1) # immutable type, identity-like copy
ERROR: CanonicalIndexError: setindex! not defined for UnitRange{Int64}
copy is strictly shallow, you shouldn’t defy API. Why not a Abe constructor that takes instances of Abe subtypes? No need to make it shallow or deep, just instantiate as you like.
Why not specialize do_thing!! or its internals instead? How is it that your code could work for both mutables and immutables?
From the discussions you linked, the motivation for not having a no-op copy for immutables is that if you are asking for a copy, your intention is probably to modify the result. There is no other use case I am aware of.
I don’t think I understand the difference between instantiating a new thing that is identical to an old thing versus copying the old thing. Not implying there isn’t a difference - just that I don’t know what it is.
You’re strictly implementing methods annotated with your defined types, so there’s no piracy going on. The issue is API
Base.copy is strictly shallow, Base.deepcopy is strictly deep, you want “deepish” so that’s a good indication you’re dealing with a different function entirely. If it’s the name you’re attached to, you can even make a AbeModule.copy, though it’d be annoying to disambiguate that from Base.copy if you or another user ever need it.
Base.copy-ing doesn’t make an identical instance (semantically that would be the SAME instance, ===), nor does it guarantee an output instance of the same type as the input (see the earlier example where a view gets copied into a Vector). It just guarantees the outmost layer of semantic elements are identical. If you want a guarantee of T in T out, then you need to make your own API, whether it’s a new function or repurposing a constructor.
Can you explain why you want to copy an immutable object? In general it’s not useful to copy something unless you’re going to mutate the in- or out-argument, which you can’t do to an immutable object.
If it might be mutable (even via mutable fields), I would just define copy.
For example, both SubArray and LinearAlgebra.Adjoint are immutable types that can wrap a mutable Array object, and both define methods like setindex! that allow you to mutate the underlying array. Both of them define copy methods when they wrap a mutable array.
In my case, I may or may not want to make changes as I copy, depending on the user supplied kws.
I don’t have any problem using a name other than Base.copy. I had thought Base.copy was a good choice but, thanks to this thread and especially to @Benny, I now know better.