I have a problem that crops up again and again, which relates to the use of different concrete types in situations where I can’t possibly establish type stability. Here’s a contrived example: There are three different subtypes, and I have a vector of two of them, which is parameterized with the abstract type. Therefore, the compiler doesn’t know the return type of the function get_value. But I as the programmer do know that it will always be Float64, because I have set up the logic that way. How can I capitalize on that knowledge?
In the second example I’ve tried to supply the compiler with the knowledge that there should only be Float64s coming out of that particular get_value call. The result is that the allocations are halved, but the computation time is doubled.
Please don’t suggest changes to this example that get away from the question, I purposefully set it up not to use union-splitting, etc., because that’s the situation I face most often. What can I do in a situation with unknown input types but known output types?
abstract type Abstract end
struct ConcreteA <: Abstract
a::Float64
end
struct ConcreteB <: Abstract
b::Float64
end
struct ConcreteC <: Abstract
c
end
# get_value(::Abstract) is inferred Any because of ConcreteC
get_value(c::ConcreteA) = c.a
get_value(c::ConcreteB) = c.b
get_value(c::ConcreteC) = c.c
# make a vector of only ConcreteA and B
concretes_ab = [rand() < 0.5 ? ConcreteA(rand()) : ConcreteB(rand()) for _ in 1:10000]
# a function that sums the values, without asserting Float64
function sum_concretes_1(cs)
x = 0.0
for c in cs
val = get_value(c)
x += val
end
x
end
# and with Float64
function sum_concretes_2(cs)
x = 0.0
for c in cs
val::Float64 = get_value(c)
x += val
end
x
end
What explains the time / allocation pattern here? Why is specifying Float64 not beneficial? Can I somehow avoid allocations altogether even while keeping c’s type unknown to the compiler?
Therefore, the compiler doesn’t know the return type of the function get_value . But I as the programmer do know that it will always be Float64 , because I have set up the logic that way. How can I capitalize on that knowledge?
You definitely can but this requires slight changes to precisely let Julia know of your knowledge. For instance:
abstract type Abstract end
abstract type SubAbstract <: Abstract end
struct ConcreteA <: SubAbstract
a::Float64
end
struct ConcreteB <: SubAbstract
b::Float64
end
struct ConcreteC <: Abstract
c
end
# get_value(::Abstract) is inferred Any because of ConcreteC
get_value(c::ConcreteA) = c.a
get_value(c::ConcreteB) = c.b
get_value(c::ConcreteC) = c.c
# make a vector of only ConcreteA and B
concretes_ab = [rand() < 0.5 ? ConcreteA(rand()) : ConcreteB(rand()) for _ in 1:10000]
# a function that sums the values, without asserting Float64
function _sum_(cs)
x = 0.0
for c in cs
val = get_value(c)
x += val
end
x
end
Now, concretes_ab is a Array{SubAbstract,1} and Julia is capable of figuring out that get_value on SubAbstract returns a Float64.
Essentially adding Float64 helps for the + function (this is why you have halved the allocations) but the compiler still needs to figure out which get_value method to call.
In case of sum_concretes_2get_value returns value of the type Any and after that it is converted to Float64. In sum_concretes_3get_value you hint compiler that return value of the function has type Float64.
It works only for small number of subtypes though. On 1.7-DEV
abstract type Abstract end
abstract type SubAbstract <: Abstract end
struct ConcreteA1 <: SubAbstract
a::Float64
end
struct ConcreteA2 <: SubAbstract
b::Float64
end
struct ConcreteA3 <: SubAbstract
b::Float64
end
struct ConcreteA4 <: SubAbstract
b::Float64
end
struct ConcreteA5 <: SubAbstract
b::Float64
end
struct ConcreteC <: Abstract
c
end
# get_value(::Abstract) is inferred Any because of ConcreteC
get_value(c::ConcreteA1) = c.a
get_value(c::ConcreteA2) = c.b
get_value(c::ConcreteA3) = c.b
get_value(c::ConcreteA4) = c.b
get_value(c::ConcreteA5) = c.b
get_value(c::ConcreteC) = c.c
# make a vector of only ConcreteA and B
concretes_ab = [rand() < 0.5 ? ConcreteA1(rand()) : ConcreteA2(rand()) for _ in 1:10000]
# a function that sums the values, without asserting Float64
function _sum_(cs)
x = 0.0
for c in cs
val = get_value(c)
x += val
end
x
end
using BenchmarkTools
@btime _sum_($concretes_ab)
# 382.936 μs (20000 allocations: 312.50 KiB)
When number of methods reaches some critical number compiler gives up and wrap everything in Any.
I don’t think I have much to say here, but declaring the vector like this turns out to solve the problem of allocations in this case and is also 5 times faster than the SubAbstract option. I don’t know if that falls into the invalid options:
# make a vector of only ConcreteA and B
concretes_ab2 = Union{ConcreteA,ConcreteB}[rand() < 0.5 ? ConcreteA(rand()) : ConcreteB(rand()) for _ in 1:10000]
Edit: I guess this will also not work for vectors with more subtypes, as mentioned above, because the compiler is probably doing splitting and it will give up.
Interesting with the ::Float64 assertion on the right hand side vs left hand side, I didn’t know that it made a difference for performance. Maybe that’s the best one can do in these situations but I still would like to understand better why.
To the other suggestions with union splitting / subtyping differently / parameterizing more tightly: I’m aware of these things but as mentioned already, often these approaches are not feasible due to a large number of types, or because you would need multiple different subtype hierarchies for different applications. I’m trying to understand how to make the most out of the type unstable situation assuming that I can’t reduce the degree of type instability (even though in the example you technically could)
Also, if I understand correctly, the allocations come from boxing the return values. But can’t I go in and say, don’t box, these return values are always going to be floats (or you can error)? If that’s not technically possible, why not?
Do you think that could be converted into a macro to be added as a flag to the for loop (as @threads), that splits the loop on all subtypes of the collection being iterated? I think that would be very useful. I know that one should avoid dispatching on these mixed collections, but that is quite a common problem and the simplicity of the code it allows is sometimes important. Also, it is not a marginal gain. In the example here, with 5 subtypes, the speedup can go from 2 times to 20 times depending on the type-specificity of the fields of the concrete types involved.
It could have a parameter for defining the maximum number of subtypes, such that it does not get used inappropriately for abstract types with too many subtypes.
I was imagining something like
@splitloop AbstractType for c in cs
...
end
or
@splitloop Union{C1,C2,C3,C4} for c in cs
...
end
that splits the loop in
for c in cs
if c isa C1
....
elseif c isa C2
....
etc.
end
What I still don’t understand is the need for boxing at all if I instruct the compiler that there will always be a Float64 at this location. Can the behavior of checking if the value is a Float64 not be done without allocations? Or couldn’t the box be reused in every loop, effectively needing only 1 allocation?
The problem is that the same compiled loop body must work for all iterations, and the compiler doesn’t know that c values are always ConcreteA or ConcreteB, so the loop body has to work with get_value(::ConcreteC) which returns a boxed value.
It helps to write get_value(c)::Float64 to get a more efficient loop body, but this is just annotating an expression, you could also write it (get_value(c))::Float64, it doesn’t change how get_value is compiled.
So for the loop body, the compiler knows that get_value(c) will be a Float64, but that doesn’t say anything about c. It must still assume that c could be a ConcreteC, meaning that get_value could return a boxed value.
To fix this, you can use an array with a more precise type (the subtype solution proposed above, or a union type). Then it’s clear the loop only calls get_value(::ConcreteA) and get_value(::ConcreteB) so no unboxing is necessary.
Another solution is to remove boxing from the compilation of get_value(::ConcreteC). This requires annotating the definition of get_value itself:
Why isn’t it possible to say “I know this expression can’t be proven to return only a Float64, but I personally guarantee that it will and I’ll accept an error if that condition is not met”. Then c could maybe be returned without box as long as it’s really a Float64. I guess I would even be fine with undefined behavior in case the assumption wasn’t met, I see this as a last-resort optimization that could help, like @inbounds allows segmentation faults when I pass an invalid index, there could be a @nobox macro.
That’s exactly what get_value(c)::Float64 means. It will throw an exception if the value returned by get_value(c) is not a (possibly boxed) Float64. This doesn’t change the fact that get_value(::ConcreteC) returns a boxed value. Functions are specialized based on the input types, not based on annotations of result expressions in the calling code.
True, but I never call get_value(::ConcreteC), in the Vector there is only A and B. That’s kind of the point, I know in this code there will only be Float64s returned, but they are boxed because the calling code can’t infer all possible return values due to the untyped field of ConcreteC. So that’s where a @nobox macro could potentially come in.
One scenario for this came up the other day where somebody tried to write a ray tracer and had a lot of allocations due to unpredictable types (lights, materials, objects). But what you can predict is what the output types of your functions are going to be in all cases. Maybe hits, or rgb colors, or something else. And the question is, can’t we elide the unnecessary boxing that the compiler uses to verify the output types of uninferrable calls if we give a manual guarantee
Here’s a slightly different example that gets closer to the problem. There are too many structs to infer get_value correctly, even though all structs have a concretely typed field x::Float64. It’s impossible to completely capitalize on that it seems.
abstract type Abstract end
# make a couple of concrete subtypes, here they all share the same field x::Float64, but
# there are too many of them so the compiler won't infer `get_value`
for x in 'A':'Z'
name = Symbol("Concrete$x")
@eval struct $name <: Abstract
x::Float64
end
end
get_value(a::Abstract) = a.x
cs = repeat([ConcreteA(1), ConcreteB(1), ConcreteC(1), ConcreteD(1), ConcreteE(1)], 10000)
function test(cs)
x = 0.0
for c in cs
val = get_value(c)
x += val
end
x
end
function test_2(cs)
x = 0.0
for c in cs
val = get_value(c)::Float64
x += val
end
x
end
This gives the following results, similar to the above example:
OK that’s a bit different. Note that even with two types ('A':'B') this code doesn’t infer. It does infer (with two types) when defining concrete methods get_value(a::$name) = a.x.
I don’t see a solution that avoids union splitting. Something like @nonboxed Float64 ... sounds useful though a bit weird, and it could be problematic since whether a particular method returns a boxed value can change as the compiler evolves…