Performance problems when dealing with deliberately type-unstable code. How to use type knowledge better?

The problem is that the same compiled loop body must work for all iterations, and the compiler doesn’t know that c values are always ConcreteA or ConcreteB, so the loop body has to work with get_value(::ConcreteC) which returns a boxed value.

It helps to write get_value(c)::Float64 to get a more efficient loop body, but this is just annotating an expression, you could also write it (get_value(c))::Float64, it doesn’t change how get_value is compiled.

So for the loop body, the compiler knows that get_value(c) will be a Float64, but that doesn’t say anything about c. It must still assume that c could be a ConcreteC, meaning that get_value could return a boxed value.

To fix this, you can use an array with a more precise type (the subtype solution proposed above, or a union type). Then it’s clear the loop only calls get_value(::ConcreteA) and get_value(::ConcreteB) so no unboxing is necessary.

Another solution is to remove boxing from the compilation of get_value(::ConcreteC). This requires annotating the definition of get_value itself:

get_value(c::ConcreteC)::Float64 = c.c

This gives very fast code with zero allocation.

4 Likes

Why isn’t it possible to say “I know this expression can’t be proven to return only a Float64, but I personally guarantee that it will and I’ll accept an error if that condition is not met”. Then c could maybe be returned without box as long as it’s really a Float64. I guess I would even be fine with undefined behavior in case the assumption wasn’t met, I see this as a last-resort optimization that could help, like @inbounds allows segmentation faults when I pass an invalid index, there could be a @nobox macro.

1 Like

That’s exactly what get_value(c)::Float64 means. It will throw an exception if the value returned by get_value(c) is not a (possibly boxed) Float64. This doesn’t change the fact that get_value(::ConcreteC) returns a boxed value. Functions are specialized based on the input types, not based on annotations of result expressions in the calling code.

2 Likes

True, but I never call get_value(::ConcreteC), in the Vector there is only A and B. That’s kind of the point, I know in this code there will only be Float64s returned, but they are boxed because the calling code can’t infer all possible return values due to the untyped field of ConcreteC. So that’s where a @nobox macro could potentially come in.

One scenario for this came up the other day where somebody tried to write a ray tracer and had a lot of allocations due to unpredictable types (lights, materials, objects). But what you can predict is what the output types of your functions are going to be in all cases. Maybe hits, or rgb colors, or something else. And the question is, can’t we elide the unnecessary boxing that the compiler uses to verify the output types of uninferrable calls if we give a manual guarantee

Here’s a slightly different example that gets closer to the problem. There are too many structs to infer get_value correctly, even though all structs have a concretely typed field x::Float64. It’s impossible to completely capitalize on that it seems.

abstract type Abstract end

# make a couple of concrete subtypes, here they all share the same field x::Float64, but
# there are too many of them so the compiler won't infer `get_value`
for x in 'A':'Z'
    name = Symbol("Concrete$x")
    @eval struct $name <: Abstract
        x::Float64
    end
end

get_value(a::Abstract) = a.x

cs = repeat([ConcreteA(1), ConcreteB(1), ConcreteC(1), ConcreteD(1), ConcreteE(1)], 10000)


function test(cs)
    x = 0.0
    for c in cs
        val = get_value(c)
        x += val
    end
    x
end

function test_2(cs)
    x = 0.0
    for c in cs
        val = get_value(c)::Float64
        x += val
    end
    x
end

This gives the following results, similar to the above example:

julia> @time test(cs)
  0.003093 seconds (100.00 k allocations: 1.526 MiB)
50000.0

julia> @time test_2(cs)
  0.001635 seconds (50.00 k allocations: 781.266 KiB)
50000.0
1 Like

OK that’s a bit different. Note that even with two types ('A':'B') this code doesn’t infer. It does infer (with two types) when defining concrete methods get_value(a::$name) = a.x.

I don’t see a solution that avoids union splitting. Something like @nonboxed Float64 ... sounds useful though a bit weird, and it could be problematic since whether a particular method returns a boxed value can change as the compiler evolves…

But isn’t union splitting only possible when a specific small set of possible types is known a priori? I don’t see how that helps with larger projects that are supposed to be extensible. For me, the raytracer problem shows that we’re somehow missing functionality to deal with such scenarios. How do other languages deal with this problem? In Swift I’d have defined an enum with all necessary types, but of course that’s not dynamically extensible either.

I this other thread an alternative in Java was tested:

But the conclusion (unexpected according to the author) was that it was slower than any of the Julia alternatives.

In that thread about the ray tracer someone suggests that C++ has a better way to deal with that, but we didn’t see that actually implemented. I think that this is a problem that deserves some attention, at least for a definitive answer and advice be available.

1 Like

I think that this is one of the cases, where Julia complexity is hidden behind very intuitive usage design. Take what following with the grain of salt, since I never wrote compiler in my life and maybe very far away from the real reason why your idea can’t be implemented, but we can try and reason about it.

Assume that you have function like

function f(x1, x2)
  y1 = g1(x1, x2)
  y2 = g2(x1, x2)
  z = h(y1, y2)
  return z
end

and you want to constraint it’s output with annotation f::Float64 and want for compiler to take this information into account and avoid unnecessary boxing.

Now, since Julia support multiple dispatch, you may have multiple versions of g1 and g2 depending on the types of its arguments, so you have to infer arguments types in order to choose appropriate functions. When you are moving top-down (from the types of arguments x1 and x2) this task is more or less straightforward - you recursively go inside functions g1, g2 until you hit some basic functions which return type you can infer and combining this information you can infer return types of these functions. As a result you get types of y1 and y2 so you know which version of h to choose and so you finally can write final llvm code (of course procedure is way more complicated than that, but I think it is main idea).

Now, what happens if you try to take into account fact that the return type of f is Float64? You can infer that type of z is Float64, because it is the return value, but what are the types of y1 and y2? You can’t just go inside the function and infer types from basic functions, because input types cardinality can become huge very quickly. For example, what if h(y1, y2) = y1 + y2? y1 and y2 can have any types that gives you Float64 in sum. It can be Float64 and Float64 or Float64 and Float32 or Float64 and Int or Float64 and MyAwesomeType if you define plus operation for these types.

At this point you have huge number of possible types of y1 and y2 and for each combination compiler should infer possible types of x1 and x2 in order to choose appropriate versions of functions g1 and g2. So this procedure should be repeated for g1 and g2 and you can see how this task quickly explode out of control. Very soon you just give up and say that everything is Any and box all variables, because there is nothing you can say about their types, so you have to choose most common of all.

So, I think backward type propagation is breaking very fast and as such it has no meaning. If you can’t infer types more than one or two step backwards what’s the reason to implement this machinery at all? It’s better to have forward type propagation and let user annotate the type of the result - which is happening now in Julia.

Maybe there are ways to solve this problem (other than annotate everything - this would return us to the land of static compiling), but it looks like it is not going to be simple, or at least it will be way more complicated than current implementation of the compiler with its limitations.

1 Like

You can use the trick here to make it faster: Snooping on and fixing invalidations: @snoopr · SnoopCompile. Gives you about a factor of 2. The key is to remember that a.x is itself a function call.

Could you please explain that a bit?

I am unable to see how that is possible without stating that get_value(c::ConcreteC) also returns Float64, which may be false.

You just need to add a method

function Base.getproperty(a::Abstract, field::Symbol)
    field === :x && return getfield(a, :x)::Float64
    error("unknown field ", field)
end

and then you don’t need the type-assert on get_value. getfield is a builtin, so there’s only one such method.

@jules, I know you said you don’t want to consider union-splitting, but do keep in mind you can accelerate your code for some known types and then have a slower path for any extensions. For example,

function test_fast(cs)
    x = 0.0
    for c in cs
        val = isa(c, ConcreteA) ? c.x :
              isa(c, ConcreteB) ? c.x :
              isa(c, ConcreteC) ? c.x :
              isa(c, ConcreteD) ? c.x :
              isa(c, ConcreteE) ? c.x : c.x
        x += val
    end
    x
end

doesn’t “lose” anything because that final c.x will handle any type outside the five listed. It is also by far the fastest option yet: if I increase the size of cs by another 100x, I get

julia> @time test(cs)
  0.129554 seconds (5.00 M allocations: 76.294 MiB, 7.95% gc time)
5.0e6

julia> @time test_fast(cs)
  0.017303 seconds (1 allocation: 16 bytes)
5.0e6

While that isa block might look useless (all options just call c.x), what you’ve done is provide multiple call sites, and the compiler can optimize each one of them differently.

3 Likes

Interesting, so maybe a feasible path to achieve this while keeping code extensible would be to use a macro as in Macro to write function with many conditionals - #8 by Skoffer, and then add a mechanism to reload all functions that depend on the subtypes output if a new subtype has been added, effectively just overwriting the old version. I think that would give you both optimal speed, low allocations, and would allow others to add new subtypes after the fact.

There’s

though I confess I haven’t played with it myself. I worry about the fact that the @unionsplit macro doesn’t seem to allow you to pass the name of the variable that you want to do all the isa on. The solution by @Skoffer you linked looks better.

But I’d strongly urge you to think twice (or three times) before doing that “redefine your methods” trick automatically. You’ll get nasty world age issues, etc. If you can just pick a set of types and live with it, that would be far better.

1 Like

Not automatically, I thought about just calling some function like reinitialize or so after adding a new type. This is just me playing around with ray tracing and trying to squeeze out performance as a learning exercise. Sometimes one can hit quite complex issues that way which reveal interesting properties of the language and come in handy later.

If you trigger redefinitions manually, that’s a bit better, because that puts the user in control of the updates. By returning to the REPL after the update and before running any code, you also give the world-age a chance to update, so you shouldn’t get mysterious errors.

Thank you @tim.holy !

The problem was that in the in the OP different fields were used in the different get_value methods, and in that case I still cannot see how replacing get_value with getproperty can help.

Not sure if relevant, but when spliting manually, like Tim did, it is possible to inject the splitting rules from the call site:

function sum_concretes_withgetter(cs; getfn = get_value)
    x = 0.0
    for c in cs
        val = getfn(c)
        x += val
    end
    x
end

function concrete_getter(c)
    if c isa ConcreteA
        return get_value(c)
    elseif c isa ConcreteB
        return get_value(c)
    end
    error("Got unknown type in concrete_getter")
end

julia> @btime sum_concretes_withgetter(concretes_ab; getfn = concrete_getter)
  34.400 μs (1 allocation: 16 bytes)

This way it may also be possible to dynamically generate the getter just before the call, if the types are not known statically at the call site.

1 Like

Right, it wouldn’t; I was responding to the updated example, where the field names are consistent. You could make getproperty access different fields for different structs, but at a certain point that gets more confusing than helpful.

The SnoopCompile docs I linked to address a fairly common situation in which subtypes of an abstract type are expected to have some fields in common so that they can provide a common interface. In Julia 1.6 we’ve improved the inferrability of code quite a bit by using that trick for several IO, LibGit2, and other abstract types.

1 Like

With the @unionsplit macro from the other thread I could eliminate all the allocations stemming from passing different Materials and Shapes to my raytracing functions. Pretty nice

2 Likes