Efficiency of abstract types vs `Any`

This is a subtle point I wanted to learn a bit more about. I want to know all reasons why I might wish to write:

# assume we have `abstract type MyType end`

MyType[x...]

Rather than

Any[x...]

Assuming we have:

abstract type MyType end

struct MyTypeImpl1 <: MyType; end
struct MyTypeImpl2 <: MyType; end

x = (MyTypeImpl1(), MyTypeImpl2())

The vector example is a bit arbitrary; I was also curious about in a field, like:

struct A
    x::Any
    # versus
    # x::MyType
end

Why would I prefer ::MyType over ::Any?

Here are the reasons I can think of:

  1. Static checking of a constraint. For example, all <:MyType structs might implement a certain interface, or store some data. We want to have a static type check to help enforce this.
  2. As a form of documentation. It makes it clear what data should go where.
  3. Potential for compiler inference, by restricting to only methods defined on ::MyType.
  4. If we had type parameters, the abstract type could participate in the type solver step.

For a while I had thought there might be some sort of efficiency benefit here (in terms of allocations), but since MyType is abstract, I understand there is no advantage over Any in this respect. Correct me if im wrong, but it might even be slower sometimes, because the runtime has to do an extra type assertion at parts of the code.

Is there anything else I am missing about why you might subtype to an abstract type, and not to Any, in objects?

6 Likes

I’ll start by saying this is one of the areas where I think Julia will change and improve in some nearby versions. So both answering how things are and how things may be changing towards.

Yes, the compiler will get this as a static constraint. And if you look at type-unstable code, it will in the typed code printout say things like _A <: MyType, so abstract type inference will know the smallest dispatch it’s allowed to assume, carry that work, and check for constraint violations.

This one is clear.

The answer is, kind of. There is a compiler optimization commonly referred to here as world splitting optimization, which is that, based in the current world age of the function, if some optimization can happen on the set of allowed types and the number of types is sufficiently small (IIRC <4), then do the optimization. A common case for this for example is (a::Any == b::Any)::Bool in Base for every dispatch, so therefore just assume what comes out of == will be a Bool until a dispatch is added (in which case, that dispatch will invalidate any uninferred a or b cases, requiring the dispatch to be recompiled in the world age that adds the “offending” == dispatch).

Because of this world age splitting, you can end up in cases where for example you have 2 concrete types which are subtypes of an abstract type, both might hardcode a field b::Bool, so therefore a::MyType.b::Bool will infer that the b should be a Bool. In that sense, you can get better type inference in the cases where you have such restrictions that world age can hit, but

This is where it can get a bit funky. If you know f(a::Any, b::Any) basically the question is “should I world split or give up?” World split is easy: if there’s more than 3 return types that can happen from any dispatch of f, then just give up. What I mean by that is f(a,b) = a+b, well FloatX + FloatX :: FloatX so that’s too many, you can’t world split. Done, validate that you can’t do that, make f(a::Any, b::Any)::Any. So pushing forward Anys is actually pretty easy in the compiler.

Pushing forward abstract types in the compiler can be much more difficult, since you need to validate if every concrete instantiation does something. f(a::AbstractFloat, b::AbstractFloat), inferring the output is ::AbstractFloat is a statement about all possible subtypes. If one subtype hits ::Any, cool short circuit give up you don’t need to check the others. But if one says ::AbstractFloat, then you need to check the next, and the next, and see they all do this. Because of that, simply giving up and going to ::Any can be nicer on the compiler, meaning that using an abstract type can slow down type inference.

Potential Changes in the Future: Interfaces

Now my personal statement from all of this is that this will be helped a lot when interfaces are added to the language. Being able to state that for every dispatch of f, you must have f(a::T,b::T)::T would be able to guarantee such relations, and you would then be able to short circuit many difficult inference problems. Additionally, I think if such interfaces exist then the language could drop world splitting as the way to get these optimizations in some cases (something that I think has been discussed quite a bit), since right now world splitting does a lot of heavy lifting on uninferred code while being a bit fragile (and one of the main reasons for invalidations and long compile times). So, I don’t think the story here is complete by any means. Right now just using ::Any can decrease compile times, but in the future by placing interfaces ::MyType could be much better because if it can impose strong guarantees, then you could make type inference do a lot less work.

So for now, ::Any is likely going to have much shorter compile times, ::MyType will sometimes get a runtime improvement if you only have a few cases and it can prove in the current world-age that it is safe, and ::MyType is great for static checks and self-documenting code so it should be preferred when we can use it it’s just certain compiler behaviors that inhibit it in its current form.

16 Likes

Thanks Chris!

Regarding this:

This is really interesting. I’m trying to suss out the real-world impact of these things. Basically, what performance characteristics are attached to “nicer on the compiler” in this context? Are these things a one-time cost at compile time, or are they a permanent runtime penalty? I suppose it would be quite different case-to-case. But maybe you have a specific example of this with some clear performance difference before/after the ::Any? I think anything to quantify this would be immensely helpful.

Yes a formal interfaces integration would be awesome. Is this now a “when” rather than an “if”? Is there a specific design that has been agreed on for this?

In those paragraphs, @ChrisRackauckas was discussing a specific narrow topic, the world-splitting optimization as part of type inference. So it’s basically about compiler latency. Of course, bad inference can lead to worse runtime performance, e.g., if it prevents some optimization.

I have to say though, the conclusion by @ChrisRackauckas does not really hold. It may go either way, because in the end it all depends on the compiler implementation, which is complicated and based on heuristics.

One effect in particular that counters the argument by @ChrisRackauckas is something I don’t completely understand, but I’ve read it said many times by the compiler developers, paraphrasing:

abstract interpretation is slow compared to concrete evaluation

I believe a specific example is how constant propagation, when not all arguments to a call are known at compile-time, is slower than constant folding, when all arguments to a call are known at compile time.

Some relevant experiments are when compiler developers try to reduce the world-splitting limit. I can’t remember if it’s three or four by default, but it’s configurable with experimental features in Base.Experimental, e.g., @max_methods or the more general @compiler_options. Anyway, the results turn out to be mixed, because when world-splitting is disallowed, inference turns out to do more abstract interpretation, and less concrete evaluation.

Here’s an idea: you’re moving elements from one container to another. The first container’s element types are typed Any. Thus, moving to the second container should be faster if it also has Any-typed containers, rather than some more specific, but still not concrete, constraint on the element type. The reason is, as indicated in the OP already, that otherwise the element type needs to be checked at run time.

Not sure how that “counters” it because that’s exactly what I’m talking about :sweat_smile: . There are effectively 3 modes with abstract interpretation: either it just keeps spitting out Any because nothing is inferred, it’s spitting out the real types in the first pass because it’s type-stable and easy to infer, or it is finding the right upper bound at every step. Type inference is a recursive algorithm, where the first two cases happen in just one pass, while the last case can require a fixed point iteration of the inference algorithm and checking many branches of unions on top of that.

No that’s union splitting.

I did say that there are cases where it can be faster, so I don’t know how this is an argument? Yes, I mentioned the cases where it could be faster. But in testing the compiler latency of ~40 packages in 2023, every single one was faster by removing the abstract types. So empirically it is very hard to find a case where the other direction does actually work out. Can you point to some packages for which it is the case? Since I can’t find them.

Again it’s a classic, it depends. Most of what I discussed was just compile time, the impact on the abstract inference algorithm. That is run each time you JIT compile, and the improvements of bailing out early are that you can decrease the number of times type inference will recurse/loop and thus make it end quicker.

However, with more information type inference will work field by field. Here’s a case from my current REPL:

ERROR: return type
OrdinaryDiffEqNonlinearSolve.NLSolver{OrdinaryDiffEqNonlinearSolve.NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}, Nothing}, true, Vector{Float64}, Float64, Nothing, Float64, OrdinaryDiffEqNonlinearSolve.NLNewtonCache{Vector{Float64}, Float64, Float64, Vector{Float64}, Matrix{Float64}, Matrix{Float64}, Nothing, Nothing, LinearSolve.LinearCache{Matrix{Float64}, Vector{Float64}, Vector{Float64}, SciMLBase.NullParameters, LinearSolve.DefaultLinearSolver, LinearSolve.DefaultLinearSolverInit{LU{Float64, Matrix{Float64}, Vector{Int64}}, LinearAlgebra.QRCompactWY{Float64, Matrix{Float64}, Matrix{Float64}}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Tuple{LU{Float64, Matrix{Float64}, Vector{Int64}}, Vector{Int64}}, Tuple{LU{Float64, Matrix{Float64}, Vector{Int64}}, Vector{Int64}}, Nothing, Nothing, Nothing, SVD{Float64, Float64, Matrix{Float64}, Vector{Float64}}, Cholesky{Float64, Matrix{Float64}}, Cholesky{Float64, Matrix{Float64}}, Tuple{LU{Float64, Matrix{Float64}, Vector{Int32}}, Base.RefValue{Int32}}, Tuple{LU{Float64, Matrix{Float64}, Vector{Int64}}, Base.RefValue{Int64}}, QRPivoted{Float64, Matrix{Float64}, Vector{Float64}, Vector{Int64}}, Nothing, Nothing}, LinearSolve.InvPreconditioner{Diagonal{Float64, Vector{Float64}}}, Diagonal{Float64, Vector{Float64}}, Float64, Bool, LinearSolveAdjoint{Missing}}}, Float64} 
does not match inferred return type
OrdinaryDiffEqNonlinearSolve.NLSolver{OrdinaryDiffEqNonlinearSolve.NLNewton{Rational{Int64}, Rational{Int64}, Rational{Int64}, Nothing}, true, Vector{Float64}, _A, Nothing, Float64, OrdinaryDiffEqNonlinearSolve.NLNewtonCache{uType, tType, tType2, rateType, J, W, ufType, jcType, lsType}, Float64} where {_A, uType, tType, tType2, rateType, J, W, ufType, jcType, lsType}

The compiler treats inference field-by-field, so if it can infer partial information it will use that. You can see that it doesn’t just say “I don’t know anything”, but instead it says “this type parameter is unknown, but this one is Vector{Float64}”. So even though this code is type unstable, cache.u could be inferred, so the type inference issue may be masked and have no performance cost in the real use case because the rest could be discarded. This can then improve type inference time because you end up in the “everything inferred case”.

How this relates to abstract types is that using an abstract type can get you there as I mentioned above because if every subtype has for example the type of .u is hardcoded to Float64 (and you have less than 4 such subtypes), the compiler can infer (via world splitting) that cache.u::Float64 always, and then you end up in this “there is an unstable call, but the getfield I understand is this type, so now I push forward a concrete type” case. That would of course improve run time. It can improve compile time if the amount of code to be inferred from the thing helped, .u here, ends up causing less inference time to be spend than the time cost of doing the field-by-field inference plus the cost of then having more abstract inference on other elements.

Yes more of a “when” and and a “what” at this point. No specific design is agreed on so far, but many elements of a design seem to be coming up repeatedly.

I’m pretty sure @max_methods is for world splitting, not union splitting.

help?> Base.Experimental.@max_methods
  │ Warning
  │
  │  The following bindings may be internal; they may change or be
  │  removed in future versions:
  │
  │    •  Base.Experimental
  │
  │    •  Base.Experimental.@max_methods

  Experimental.@max_methods n::Int

  Set the maximum number of potentially-matching methods considered when running
  inference for methods defined in the current module. This setting affects
  inference of calls with incomplete knowledge of the argument types.

  The benefit of this setting is to avoid excessive compilation and reduce
  invalidation risks in poorly-inferred cases. For example, when @max_methods 2
  is set and there are two potentially-matching methods returning different
  types inside a function body, then Julia will compile subsequent calls for
  both types so that the compiled function body accounts for both possibilities.
  Also the compiled code is vulnerable to invalidations that would happen when
  either of the two methods gets invalidated. This speculative compilation and
  these invalidations can be avoided by setting @max_methods 1 and allowing the
  compiled code to resort to runtime dispatch instead.

  Supported values are 1, 2, 3, 4, and default (currently equivalent to 3).

  ──────────────────────────────────────────────────────────────────────────────

  Experimental.@max_methods n::Int function fname end

  Set the maximum number of potentially-matching methods considered when running
  inference for the generic function fname. Overrides any module-level or global
  inference settings for max_methods. This setting is global for the entire
  generic function (or more precisely the MethodTable).
1 Like

Ehh it’s a little bit of both, so it’s probably better to just be exact with what it’s doing.

IIRC what this is actually doing is it takes in the abstract type and it keeps running inference on different concrete types until the limit. For example if you have f(a::AbstractFloat), it will first find f(a::Float16)::Float16, then f(a::Float32)::Float32, f(a::Float64)::Float64. It will only do that for if there are n choices, so yes that’s a world-splitting optimization where with n=3 and with no other AbstractFloats it would infer f(a::AbstractFloat)::Union{Float16, Float32, Float64}, and then carry that union forward. And because that’s a union of 3 it would keep splitting downwards. If you made n=2 it would give up quicker, would could make it f(a::AbstractFloat)::Any or f(a::AbstractFloat)::AbstractFloat(depending on the code), and carry forward the abstract type. So :person_shrugging: with an abstract type it uses the current world to get a union and then relies on union splitting to keep pushing it forward, so I’d attribute it to union splitting since with a lower N you can still get the abstract types but you don’t get the concretization, but it is a little bit of both.

Calling in @gbaraldi to disambiguate or add any more detail he can think of.

One very important but kinda trivial reason is:

You can have methods foo(x::Vector{MyType}); these would apply to MyType[...] and not to Any[...].

In other words: Non-erasure of parametric types is not just an optimization / performance / compiler thing, it is core part of julia semantics.

If you don’t need this dispatch, then I think most of the time, Vector{Any} should be better for performance.

1 Like

Yeah, maybe this is just semantics, but I’d say that yes there’s a union splitting component there, but that union splitting only occurs because world splitting happened first.

Specifically, I think this distinction is important because union splitting is typically understood as not being vulernable to being invalidated by non-local code changes. Whereas world splitting is to me primarily distinguished by the fact that it is an optimization that relies on the state of the global method table, and is susceptible to being invalidated by long-range changes to the method table.

It’s my understanding that @max_methods is all about controlling whether code can be invalidated by these non-local changes, so I connect it squarely with world splitting.

1 Like

there has been quite a few related discussion on union splitting / world age splitting in relation to performence, you might be interested in them:

8 posts were split to a new topic: Could introducing formal interfaces be nonbreaking?