How to simplify type parameters and still be type stable?

fverdugo · June 18, 2019, 6:46am

In my code I need to use type parameters to keep my structs type stable. E.g.,

struct Foo{A,B}
  a::A  
  b::B
end

So far, so good.

Now, imagine that the types A and B are also structs that in turn have type parameters. These type parameters can be structs that have type parameters, and so on. E.g.,

a = Foo(3,4.0)
b = 2.0
c = Foo(a,b)
foo = Foo(a,c)
typeof(foo) == Foo{Foo{Int64,Float64},Foo{Foo{Int64,Float64},Foo{Foo{Int64,Float64},Float64}}}

Even in this toy example, typeof(foo) becomes quite complicated (at least for my eyes). In my real code, I end up with extremely complex types, even though I have only about of 4 or 5 levels of nested structs and the structs have only 2 or 3 fields each.

My question is, will I run into a (compination, runtime) bottleneck by recording so much info into the type name?

The problem is that I only see an alternative: Do not use type parameters, which will lead to a type-unstable code…

Any help will be highly appreciated!

cortner · June 18, 2019, 6:57am

Thank you for asking that question. I’ve often wondered the same.

quinnj · June 18, 2019, 7:13am

That’s quite the parameterized type!

My guess is that you will in fact run into compilation issues. There’s definitely such a thing as “too much type information” in Julia. Particularly when considering code meant to run in a production setting, you actively want to avoid as much dynamic compilation at runtime as possible (since compiling “stops the world” leaving an application unresponsive).

Leaving a field type “untyped” is perfectly acceptable; oftentimes, there are ways to define “getters” that can extract a specific member of a deeply nested type which can then be passed to an operational function to work on that specific member. Consider:

struct Foo
    a
    b
end

get_foo_a_c(f::Foo) = f.a.c

function do_cool_things_on_c(c::CType{A, B}) where {A, B}
    # this function will be compiled for each unique CType
end

do_cool_things_on_c(get_foo_a_c(foo))

In this contrived example, we leave the initial Foo fields untyped, then define a “getter” function get_foo_a_c which pulls this c member out of foo. While this extraction function is indeed type unstable, we then pass c to the do_cool_things_on_c function, which will be compiled fresh for this specific CType. So we pay a small cost of type unstability in extraction, and still get the benefits of type stable compiled code in do_cool_things_on_c which is probably doing a large ratio of work compared to the small cost of type instability of extracting c from Foo.

This idea involves using a “function barrier” as outlined in the manual, which allows fresh compilation of a function with the runtime types of the values pass as arguments. This idea is used in CSV.jl, which involves an inherently type unstable process: various parsing and file options are passed as arguments to CSV.read, objects with type parameters are generated at runtime (type unstable), but then all these parsing options are passed to an explicit parsing function that does the vast majority of the csv file parsing work, with type stability. So we pay a small type instability cost in order to have type stable code later.

At the end of the day, you should really just benchmark things and take note of where things tend to get bogged down. It might be the case that compilation doesn’t have much of an issue with things, so don’t worry about it. But if you do find a certain case where compilation takes a long time, that’s when you can start to consider different ways to structure your data to find a better balance: a little type stability in a few places while keeping main kernel functions type stable.

Hope that helps.

Tamas_Papp · June 18, 2019, 7:14am

I don’t quite see where you are hardcoding anything in your example, the constructor takes care of figuring the type out.

Using (parametric) concrete types adapted to your code, you trade off compilation time for runtime. This is frequently the sensible choice. If your code uses a lot versions of a parametric type, you may gain from being less specific. But always benchmark.

fverdugo · June 18, 2019, 9:50am

Thank you very much for your detailed answer!

In some places of the code, I am already using type unstable structs + function barriers as you mentioned. I have to better check if some of the type parameters can be removed without affecting performance very much

fverdugo · June 18, 2019, 9:52am

Thanks for the answer. The link to being less specific is useful.