Reason behind designing parametric types as invariant

parametric-types

#1

I was looking into Rust where types (mostly) have variance yet that does not seem to prevent Rust’s compiler from generating efficient machine code.

So my question is what is the reason behind designing parametric types in Julia as invariant?
What is the unique feature of Julia that prevents from replicating Rust’s seemingly good design choice on this particular issue?


#2

In a sense we do:

If we look at our table of variances, we see that &mut T is invariant over T . As it turns out, this completely fixes the issue!

Since all types in Julia behave like &mut T in Rust (or at least there’s no way of expressing anything else in the type system—there are immutable types and reference types but they are all treated uniformly), that fact translated to Julia would imply that all parametric types are invariant, which is precisely what we do. In order to do something as fine-grained as what Rust does, you need as fine-grained and complex a type system, which we don’t have and have deemed not to be a good trade off for the kind of applications we want Julia to excel at.

In another sense Julia already has covariant and contravariant parametric types: P{<:T} is covariant in T and P{>:T} is contravariant in T. In other words if you want to write co/contravariant parametric types you can, it’s just that the safe invariant form is the default. We could have changed things (I’ve considered it) and made P{T} mean what P{<:T} means now. But then we’d need a way to write what P{T} means now. Maybe P{=:T}? (Not great since =:T is already syntax but it works in unary position.)

So covariant by default is a possibility, but what should be the default? Another question that comes up now and then is if we couldn’t allow subtyping concrete types. And we potentially could, but if we did what most languages do and allowed both subtyping of concrete types and made covariant the default for parametric types then we’d really be hoist since Vector{Float64} would be forced to be a pointer array and we’d lose all hope of being a sane language for numerical computing. This is precisely the reason why primitives (double etc.) need to be special in Java and user-defined types are not as efficient as primitives. (See “value types” for a proposed way to try to remedy the situation in Java.)

But still, we could do one or the other and of the two I think that making parametric types covariant by default is the more reasonable but I’m still not sure that it’s a good idea. The main argument for it is that when someone writes

f(strs::Vector{AbstractString})

you often should have written

f(strs::Vector{<:AbstractString})

But then again, what if it was

f!(strs::Vector{AbstractString})

and f! tries to insert a string of a particular type into strs? Maybe that’s fine because Julia generally does automatic conversion for you in such situations which should actually work as desired. On the other hand, how hard is it to write

f!(strs::Vector{<:AbstractString})

if that’s what you want? Invariance is the simplest, least dangerous variance, so shouldn’t it be the default and easiest to express?


#3

Thank you for illuminating and detailed response. How about the proposal:

  1. add P{=:T} syntax to mean the current meaning of P{T}
  2. make P{T} a sugar to P{<:T}

This way we have nice consistent syntax P{<:T}, P{=:T}, P{>:T} to talk about covariance, invariance and contravariance of T respectively.
With this we can make P{T} to be a sugar for one of those 3 options. Which one? I argue that it must be P{<:T}. The immediate objection here is that it is not much harder to write P{<:T} with the current syntax and that is a fair point. However, imagine a typical MATLAB/Python/R user coming to Julia. It is fair to assume that he is not familiar with the notion of subtype variance. Once he learns about parametric types and type hierarchies he naturally conceives of the notion of subtype covariance (since it is an amalgamation of those two ideas) and tries to write something like f(v::Vector{Real}) while what he intended was f(v::Vector{<:Real}). And here we wreck the train for him by explicitly teaching that Vector{Real} contrary to his expectations is invariant (indeed there is even warning about it in the documentation). In doing so we introduce notion of subtype variance making the first Julia experience more complicated than is necessary.

This is exactly what happened to me on my first encounter with Julia. As a result of this confusion I asked the this question and you can see @dpsanders trying to explain type invariance to me there.

In summary, giving P{T} the meaning of P{<:T} has the following advantages:

  • follows principle of least surprise
  • makes the Julia’s learning curve less steep
  • when one writes P{T} with abstract T, he almost never means P{=:T} and means P{<:T} instead
  • compiler can generate efficient machine code for f(v::Vector{<:Real}), while the currently f(v::Vector{Real}) results in inefficient code.

#4

This makes the most sense in my opinion, I really like the <: syntax of Julia.


#5

In my experience most people just learn about the issue when they bump into it, and then internalize this knowledge and move on.

Since P{<:T} is the case used more often, there is some merit to the argument of making it the default. However, I happen to like the fact that <: makes things explicit, and I don’t mind the extra 2 characters for this. Also, what about

f(x::P{T}) where {T <: S}

should this be

f(x::P{=:T}) where {T <: S}

? This I would dislike.

Finally, it seems that you went from asking about an element of Julia’s design to proposing a fundamental change in less than 24 hours. While this in itself does not reflect on the substance of your proposal, note that Julia is a fairly mature language at this point, and making this kind of change would require very strong reasons with very good arguments. If you really think that this would be an improvement, the best way to start is by making a PR to Julia, which would allow people to assess the ramifications of this change on a fairly large codebase. Since this is a breaking change and could only happen in 2.0, you have plenty of time for this.


#6

This sounds dismissive, even condescending. Please, spare me your such comments. At least, the first part of your response was on subject and for that I thank you.


#7

This was not the intention, perhaps you misunderstood something.

All changes to Julia started out as PRs, and many major ones went through multiple iterations, abandonning earlier attempts in favor of other ones. Suggesting that you make a PR to evaluate your proposal should not be understood negatively.


#8

Even if one intended to make PR on the proposal, can’t he first discuss it on the platform that is dedicated for, well… this kind of discussions?

There are two possible cases here:

  • T is a concrete type (most likely scenario). In this case, there is no difference between P{=:T} and P{<:T}, so f(x::P{T}) where {T <: S} still would mean what it currently means.
  • T is an abstract type (unlikely scenario). In this case f(x::P{T}) where {T <: S} would be equivalent to f(x::P{<:T}) where {T <: S}, which, when T is abstract, is almost always what one wants. In those exceptionally rare cases when you actually want invariance of an abstract type T it would still be achievable with f(x::P{=:T}) where {T <: S}

#9

There is a difference in the case of f(::T, ::P{T}) where {T<:S} and f(::T, ::P{<:T}) where {T<:S}: in the former case T can be abstract whereas in the latter case T must be concrete (because of the “diagonal rule”).

I’m of two minds about this: on the one hand covariance seems a bit more intuitive to newcomers; but the current design is so simple and once you learn it, it’s very clear.


#10

This is a very disproportionate response to helpful feedback, IMO.


#11

In the case of function arguments, yes you basically always want Vector{<:Real} instead of Vector{=Real}. But that is a bit context-specific. To construct a heterogeneous vector, you would have to write Vector{=Any}(...), and Vector{Any}(...) wouldn’t work since the type is abstract and hence not constructible. OK, that makes me think that maybe type syntax itself should be context-specific and mean something different for function arguments. But that of course is also confusing, not obviously better than what we have now.

Except, of course, that this intuition is wrong in the case of mutable vectors, so how hard should we try to accommodate it?

It more likely results in a method error, since e.g. f([1,2]) or f(rand(2)) won’t call that method.


#12

In other words, the code to operate on a vector always corresponds to the representation of the vector and that doesn’t change no matter how variance works. If a vector is concretely typed (or even better stored inline) then the code is efficient; if it’s stored as a pointer array (or worse abstractly typed), then the code is inefficient. Variance has nothing to do with that, it only determines how you write the signature of a method that applies to various kinds of vectors.