I’ve been upgrading a few packages to v0.7, and one of the biggest sources of deprecations is the removal of the default constructor->convert fallback. For those unfamiliar, previously if you had:
struct T
...
end
Base.convert(::Type{T}, x::Int) = (construct a new T from an Int)
then you you could call T(1) and, if there was not constructor matching (::Int), then convert(T, 1) would automatically be called for you. That was convenient but also kind of muddied the distinction between conversion and construction, and led to awkward error messages. So I’m reasonably happy to see it go.
But my question is, when upgrading a library that used to rely on this pattern, we have two options:
Move with the language: where users were previously calling T(x), they should instead be calling convert(T, x), unless they actually meant to call a constructor
Maintain the API: define the fallback constructor T(x::Int) = convert(T, x) ourselves.
(1) seems easy (no code changes required in the library), but somewhat unfriendly, as it’s pretty unclear to most users of a library whether the T(x) they’re calling is a “real” constructor or a fallback.
Do you all have any recommendations? What’s the new thought process on when something should be a constructor or a convert method? And, likewise, what’s the thinking on when a user should be trying to construct vs. trying to convert?
I think the idea is that you should define the constructor manually. The difference between convert and constructors is that the former will avoid copying as much as possible, while the latter are guaranteed to return new objects which do not alias with the original object.
In many cases you can just define constructors in terms of convert or vice-versa, sometimes with a call to copy or deepcopy.
That is how I have been moving software into v0.7. After a while, the “niceness” of this comes through more clearly. The new conceptual separation puts the declarative (what linguists find “imperative”) at our fingertips and lets us specialize in situations where the interest is to enact conversion of one manner of expression into another [or not].
Yes, you should make them constructors. In the case of Number types, convert definitions are not usually necessary since this method exists:
convert(::Type{T}, x::Number) where {T<:Number} = T(x)
Constructors are considered the lowest-level way to make an instance of something, so all new types should have them. Then convert definitions are inherited or added on top as needed.
Is there a convention for what T(x::T) should do for non-isbits types? Should it make a deep copy or a shallow one, or even just return x? I know there was some discussion on this subject at some point, but I missed the conclusion if there was one.
I suppose this would answer my question, but I wasn’t aware of such a guarantee.
It should definitely make a copy, else you couldn’t modify the result safely without checking the type of the input first.
As for whether it should be a shallow or a deep copy, I’m not completely sure. FWIW, the DataFrame constructor doesn’t make a copy of column vectors when passed a DataFrame, as its doesn’t make a copy of vectors when called e.g. as DataFrame(x=some_vector). But maybe for other types it makes more sense to make a deep copy.
For mutable objects, a constructor should make a shallow copy. However, the definition of “shallow” depends on the semantics of the object. It should copy any structure that belongs to one instance. For example, if a DataFrame is considered a collection of columns, then the columns should not be copied, just the container holding them. But if a DataFrame is considered to be a collection of all the data inside it, then the columns should be copied as well. This also relates to the set of available operations. For example if a DataFrame has a function that inserts a row by mutation, then the DataFrame “manages” the columns and copy and the constructor should copy them. As always, there might be borderline cases that require judgment calls.
What about immutable objects that contain something that is by convention immutable (such as my Str types,
which have a String used as a low-overhead known to the GC buffer) or mutable objects that are by convention immutable? (such as BigFloat or BigInt [which caused a big discussion 2 years ago])