Combinatorial Type Explosion: Nested Parameteric Types

Let us say I have a bunch of functions that I want to modify based on the stream type. Different density methods for liquid streams and gas streams etc. I create an abstract type “Substance” and define the stream as follows:

mutable struct ProcessStream{T, S<:AbstractSubstance} <: AbstractConnector{T}
    temperature :: T
    pressure  :: T
    components :: Vector{T}
end

here, T can be many different kinds of numbers (including Dual numbers because I’m trying to optimize this with ForwardDiff). The thing is, if I want to connect these four streams to a heat exchanger, I now have many parameters

mutable struct HeatExchanger{T,S1,S2,S3,S4} <: AbstractHeatExchanger{T}
    LMTD::T
    heating_inlet::ProcessStream{T,S1}
    heating_outlet::ProcessStream{T,S2}
    cooling_inlet::ProcessStream{T,S3}
    cooling_outlet::ProcessStream{T,S4}
    specs::HeatExchangerSpecs
end

This is a bit unwieldy when I’m trying to build models. It’s even worse when the heat exchanger is part of a distillation unit. In fact, I could see this as an example of a combinatorial explosion of types; every parametric type will need all the parameters of all the parametric types below, recursively. A clever workaround would be to make the type a field

mutable struct ProcessStream{T} <: AbstractConnector{T}
    substance :: DataType
    temperature :: T
    pressure  :: T
    components :: Vector{T}
end

I could then dispatch functions based on substance like

get_density(s::ProcessStream) = get_density(s.substance, s)

function get_density(::Type{T}, s::ProcessStream) where T <: IdealGas
....
end

which results in a much simpler HeatExchanger object

mutable struct HeatExchanger{T} <: AbstractHeatExchanger{T}
    LMTD::T
    heating_inlet::ProcessStream{T}
    heating_outlet::ProcessStream{T}
    cooling_inlet::ProcessStream{T}
    cooling_outlet::ProcessStream{T}
    specs::HeatExchangerSpecs
end

Thus the number of parameters stays at a constant value of 1 instead of exploding exponentially. I have a feeling that the first method is correct, but the second method is so much more convenient. How much of a performance hit do you think I’d get if I need this inside an optimization?

This seems like a totally reasonable approach, but you’re right that it can have some performance impact. The only way to be sure is to measure, but a good rule of thumb is that the relative impact will depend on how much work you are doing in get_density. Basically, doing get_density(s) will take an amount of time equal to:

  1. The actual time required in get_density(T, s)
  2. Some constant amount of time required to look up the right get_density method at run-time.

I’d guess that (2) should be about 100ns, so if (1) takes 1ms, then you’re not even going to notice (2), and your method will work great. If (1) takes 10ns, then the additional overhead from (2) is going to destroy your efficiency.

If you can figure out which case applies in your example (maybe by trying both approaches in a simple model), then it should be easier to figure out if this design is a good choice.

By the way, the approach you’ve outlined here is similar to what’s used in RigidBodyDynamics.jl's Joint type, which contains a joint_type field holding its concrete type. That lets us do things like:

function joint_transform(joint::Joint, q::AbstractVector)
    return joint_transform(joint.joint_type, q)
end

and dispatch to a specialized joint_transform like your get_density. RigidBodyDynamics.jl avoids the type-instability issue by using GitHub - tkoolen/TypeSortedCollections.jl: Type-stable operations on type-heterogeneous collections to hold the collection of differently-typed joints.

3 Likes

Even this is not clear. If you have a large problem taking a long time to calculate, in which calls to get_density make up only a tiny fraction then you can get away with the less efficient approach just fine. Only if the call to this function makes up a significant amount of your total computation it is worth worrying about this. This is something you can figure out on a prototype using the profiler.

https://docs.julialang.org/en/v1/manual/profile/

2 Likes

I see; it’s as I feared. One of the reasons why I’m using this as an example is that a lot of times, we can can just use simplified methods to get densities instead of going into phase diagrams and equations of state. Thus a finding the right “get_density” function would be cheaper than the rigorous calculation, but it would definitely be more expensive than the shortcut methods. For basic usage, I probably wouldn’t be using methods like “get_density” very much; but heavier use of functions like this could very well be on the horizon. I might have to just bite the bullet.

Nevertheless, I realized if I just wrote the structs as

mutable struct HeatExchanger{T,S1,S2,S3,S4} <: AbstractHeatExchanger{T}
    LMTD::T
    heating_inlet::S1
    heating_outlet::S2
    cooling_inlet::S3
    cooling_outlet::S4
    specs::HeatExchangerSpecs
end

At least I would only need to have a set of parameters defined one level down, although this still means there would be a lot of re-compiling for a single small change.

The other problem is type conversion. The “T” parameter is the principal data type (usually Int32, Float64 and dual numbers). Would there be some sort of easy way to say, take a HeatExchanger{Float64, …} and obtain a similar HeatExchanger{Int32, …} where only the first parameter has changed?

const HeatExchangerFields = fieldnames(HeatExchanger)

function retype(x::HeatExchanger, ::Type{T}; converter=T) where T
  params = (T, typeof(x).parameters.[2:end]..,)
  vals = collect(getfield(x,s) for s in HeatExchangerFields)
  vals[1] = converter(vals[1]) 
  return HeatExchanger{params...}(vals...)
end

To go from an Int to a Float64 retype(x, Float64).
To go from a Float64 to an Int
retype(x, Int; converter=x->trunc(Int,x)), (or x->round(Int,x))
because Int(1.5) throws an InexactError. If you prefer that happen,
as a way to catch problem data, just leave out the converter.

2 Likes

So the thing is, once the all-encompassing model is built, “substance” field in the streams shouldn’t change, but it’s in a mutable struct because all of the numerical fields change. I don’t think constant propagation is an option.

However, is there any way we could help that lookup along? For example, a generated get_density function with a big honking if statement?

function get_density(s::ProcessStream)
    substance = s.substance
    if substance <: IdealGas
        return get_density(IdealGas, s)
    elseif substance <: IdealLiquid
        return get_density(IdealLiquid, s)
    ...
    else
        return get_density(Substance, s)
    end
end

I’d have to walk through the “Substance” tree top-down and use it to build the “if” statement from the bottom up. I mean, unless someone does something stupid like define an assumed density for every single stream, I don’t think we’d get more than 20 different stream types.

You’re looking at a case of the “Big Switch” antipattern. An often-suggested alternative is to use a map (aka Dict). You could make a dictionary mapping substance type to get_density methods. Because your if’s are looking for <:, you’ll likely need to query the Dict with the substance supertype(s).

That sounds like building your own method table. If you’re going to do that, why not just use dynamic dispatch?

In my opinion, the original approach is the best approach:

You might as well leverage parametric types as much as you can, since they are one of the strengths of the language.

2 Likes

Yes, I was focusing on the last message instead of looking at the bigger picture. My bad.

1 Like

I could see how this would be cleaner, but would this be faster than the if statements, or even multiple dispatch? I don’t think I’d actually have to use supertypes here and the above “if statement” would actually just be a bunch of equality if statements. However, this would be a dictionary of abstract types

densityMethodDict = Dict{DataType,Function}()
densityMethodDict[IdealGas] = s->get_density(IdealGas, s)

Julia dictionaries tend to be slow, and I’d imagine that the performance of using a dictionary with abstract types would be worse.

From this point though, I think if I used metaprogramming, I could write a generated function along the lines of

function get_density(s::ProcessStream)
    substance = s.substance
    if substance == IdealGas
        return get_density(IdealGas, s)
    elseif substance == IdealLiquid
        return get_density(IdealLiquid, s)
    ...
    else
        return get_density(Substance, s)
    end
end

I’m pretty sure this can be done. I’ve generated block expressions before but it may take me a while to code the “if statement” generator.

Nevertheless that comes into the realm of “premature optimization”. Right now, all I need to know is that such an optimization is possible; at the design level, I just want to know how much going this route (with types as fields) will paint me into a corner (performance-wise). The reason why this “if statement” might be faster is because every type-argument going into each value-dispatched get_density function would be known constant at compile time, allowing the compiler to just insert the functions and compile. Am I correct in saying this?

1 Like

There is one example here: Macro to write function with many conditionals - #8 by Skoffer

That macro is a very instructive example, imo

Yes, that was the original motivation of that macro by the way.

3 Likes

NIIIIIICE! Seriously, if this works out, I would have stumbled onto a really neat design pattern (that someone else probably already thought of). This kind of design pattern should probably go under “Performance Tips → The dangers of abusing multiple dispatch”. The reason is because additional paramterization of a low-level type (like ProcessStream{T,S}) would balloon the complexity of the parametric types that contain it. Any little change in a low-level type would cause everything above it to recompile, making precompilation nigh impossible. This could cause latency issues if you’re trying to implement this into a solution that analyzes real-time data.

By setting aside a “dispatch field”, I can get multiple-dispatch benefits, type-stability, AND be able to have some form of precompilation. I guess the tradeoff is that the “manual dispatch” functions are lengthy generated if-statements that are only slightly slower at runtime, but I can definitely live with that.

1 Like

The function table is unlikely to be faster, but it has a couple possible advantages. It can be modified dynamically, and you can lookup functions by keys other than types.