Performance problems when dealing with deliberately type-unstable code. How to use type knowledge better?

jules · December 29, 2020, 2:30pm

But isn’t union splitting only possible when a specific small set of possible types is known a priori? I don’t see how that helps with larger projects that are supposed to be extensible. For me, the raytracer problem shows that we’re somehow missing functionality to deal with such scenarios. How do other languages deal with this problem? In Swift I’d have defined an enum with all necessary types, but of course that’s not dynamically extensible either.

lmiq · December 29, 2020, 2:43pm

I this other thread an alternative in Java was tested:

But the conclusion (unexpected according to the author) was that it was slower than any of the Julia alternatives.

In that thread about the ray tracer someone suggests that C++ has a better way to deal with that, but we didn’t see that actually implemented. I think that this is a problem that deserves some attention, at least for a definitive answer and advice be available.

Skoffer · December 29, 2020, 3:41pm

I think that this is one of the cases, where Julia complexity is hidden behind very intuitive usage design. Take what following with the grain of salt, since I never wrote compiler in my life and maybe very far away from the real reason why your idea can’t be implemented, but we can try and reason about it.

Assume that you have function like

function f(x1, x2)
  y1 = g1(x1, x2)
  y2 = g2(x1, x2)
  z = h(y1, y2)
  return z
end

and you want to constraint it’s output with annotation f::Float64 and want for compiler to take this information into account and avoid unnecessary boxing.

Now, since Julia support multiple dispatch, you may have multiple versions of g1 and g2 depending on the types of its arguments, so you have to infer arguments types in order to choose appropriate functions. When you are moving top-down (from the types of arguments x1 and x2) this task is more or less straightforward - you recursively go inside functions g1, g2 until you hit some basic functions which return type you can infer and combining this information you can infer return types of these functions. As a result you get types of y1 and y2 so you know which version of h to choose and so you finally can write final llvm code (of course procedure is way more complicated than that, but I think it is main idea).

Now, what happens if you try to take into account fact that the return type of f is Float64? You can infer that type of z is Float64, because it is the return value, but what are the types of y1 and y2? You can’t just go inside the function and infer types from basic functions, because input types cardinality can become huge very quickly. For example, what if h(y1, y2) = y1 + y2? y1 and y2 can have any types that gives you Float64 in sum. It can be Float64 and Float64 or Float64 and Float32 or Float64 and Int or Float64 and MyAwesomeType if you define plus operation for these types.

At this point you have huge number of possible types of y1 and y2 and for each combination compiler should infer possible types of x1 and x2 in order to choose appropriate versions of functions g1 and g2. So this procedure should be repeated for g1 and g2 and you can see how this task quickly explode out of control. Very soon you just give up and say that everything is Any and box all variables, because there is nothing you can say about their types, so you have to choose most common of all.

So, I think backward type propagation is breaking very fast and as such it has no meaning. If you can’t infer types more than one or two step backwards what’s the reason to implement this machinery at all? It’s better to have forward type propagation and let user annotate the type of the result - which is happening now in Julia.

Maybe there are ways to solve this problem (other than annotate everything - this would return us to the land of static compiling), but it looks like it is not going to be simple, or at least it will be way more complicated than current implementation of the compiler with its limitations.

tim.holy · December 30, 2020, 1:02am

You can use the trick here to make it faster: Snooping on and fixing invalidations: @snoopr · SnoopCompile. Gives you about a factor of 2. The key is to remember that a.x is itself a function call.

tisztamo · December 30, 2020, 9:49am

Could you please explain that a bit?

I am unable to see how that is possible without stating that get_value(c::ConcreteC) also returns Float64, which may be false.

tim.holy · December 30, 2020, 10:32am

You just need to add a method

function Base.getproperty(a::Abstract, field::Symbol)
    field === :x && return getfield(a, :x)::Float64
    error("unknown field ", field)
end

and then you don’t need the type-assert on get_value. getfield is a builtin, so there’s only one such method.

@jules, I know you said you don’t want to consider union-splitting, but do keep in mind you can accelerate your code for some known types and then have a slower path for any extensions. For example,

function test_fast(cs)
    x = 0.0
    for c in cs
        val = isa(c, ConcreteA) ? c.x :
              isa(c, ConcreteB) ? c.x :
              isa(c, ConcreteC) ? c.x :
              isa(c, ConcreteD) ? c.x :
              isa(c, ConcreteE) ? c.x : c.x
        x += val
    end
    x
end

doesn’t “lose” anything because that final c.x will handle any type outside the five listed. It is also by far the fastest option yet: if I increase the size of cs by another 100x, I get

julia> @time test(cs)
  0.129554 seconds (5.00 M allocations: 76.294 MiB, 7.95% gc time)
5.0e6

julia> @time test_fast(cs)
  0.017303 seconds (1 allocation: 16 bytes)
5.0e6

While that isa block might look useless (all options just call c.x), what you’ve done is provide multiple call sites, and the compiler can optimize each one of them differently.

jules · December 30, 2020, 10:44am

Interesting, so maybe a feasible path to achieve this while keeping code extensible would be to use a macro as in Macro to write function with many conditionals - #8 by Skoffer, and then add a mechanism to reload all functions that depend on the subtypes output if a new subtype has been added, effectively just overwriting the old version. I think that would give you both optimal speed, low allocations, and would allow others to add new subtypes after the fact.

tim.holy · December 30, 2020, 10:46am

There’s
https://github.com/jlapeyre/ManualDispatch.jl
though I confess I haven’t played with it myself. I worry about the fact that the @unionsplit macro doesn’t seem to allow you to pass the name of the variable that you want to do all the isa on. The solution by @Skoffer you linked looks better.

tim.holy · December 30, 2020, 10:48am

But I’d strongly urge you to think twice (or three times) before doing that “redefine your methods” trick automatically. You’ll get nasty world age issues, etc. If you can just pick a set of types and live with it, that would be far better.

jules · December 30, 2020, 10:51am

Not automatically, I thought about just calling some function like reinitialize or so after adding a new type. This is just me playing around with ray tracing and trying to squeeze out performance as a learning exercise. Sometimes one can hit quite complex issues that way which reveal interesting properties of the language and come in handy later.

tim.holy · December 30, 2020, 10:54am

If you trigger redefinitions manually, that’s a bit better, because that puts the user in control of the updates. By returning to the REPL after the update and before running any code, you also give the world-age a chance to update, so you shouldn’t get mysterious errors.

tisztamo · December 30, 2020, 11:03am

Thank you @tim.holy !

The problem was that in the in the OP different fields were used in the different get_value methods, and in that case I still cannot see how replacing get_value with getproperty can help.

Not sure if relevant, but when spliting manually, like Tim did, it is possible to inject the splitting rules from the call site:

function sum_concretes_withgetter(cs; getfn = get_value)
    x = 0.0
    for c in cs
        val = getfn(c)
        x += val
    end
    x
end

function concrete_getter(c)
    if c isa ConcreteA
        return get_value(c)
    elseif c isa ConcreteB
        return get_value(c)
    end
    error("Got unknown type in concrete_getter")
end

julia> @btime sum_concretes_withgetter(concretes_ab; getfn = concrete_getter)
  34.400 μs (1 allocation: 16 bytes)

This way it may also be possible to dynamically generate the getter just before the call, if the types are not known statically at the call site.

tim.holy · December 30, 2020, 11:18am

Right, it wouldn’t; I was responding to the updated example, where the field names are consistent. You could make getproperty access different fields for different structs, but at a certain point that gets more confusing than helpful.

The SnoopCompile docs I linked to address a fairly common situation in which subtypes of an abstract type are expected to have some fields in common so that they can provide a common interface. In Julia 1.6 we’ve improved the inferrability of code quite a bit by using that trick for several IO, LibGit2, and other abstract types.

jules · December 30, 2020, 1:47pm

With the @unionsplit macro from the other thread I could eliminate all the allocations stemming from passing different Materials and Shapes to my raytracing functions. Pretty nice

Topic		Replies	Views
Type stability with a vector of abstracts inside struct Performance type , struct , type-stability	7	859	October 28, 2019
Input Type Performance question	2	264	March 15, 2021
Looping over different types with common behavior Performance	9	1032	June 30, 2018
Avoiding Vectors of Abstract Types Performance question , type-stability	22	4373	February 17, 2022
Improving performance elegantly - type stability General Usage type-stability	20	1384	January 13, 2019

Performance problems when dealing with deliberately type-unstable code. How to use type knowledge better?

Related topics