Storing collections of heterogeneous data

I’m writing some Julia code in which I need to loop over a collection of observations. The observations are not all of the same type and each type of observation is defined in a specific structure. I need to then perform a number of observation-specific calculations (these calculations do all return the same type). However, the loop over the observations is not type stable due to the fact we are indexing into a heterogeneous collection. Below is some example code illustrating the basic idea.

I’m wondering if others have encountered similar issues and found a better way to organise collections of heterogeneous data that can be looped over in a type-stable way.


# Some observation structures
abstract type Observation end
struct O1 <: Observation
    # O1 Observation fields
end
struct O2 <: Observation
    # O2 Observation fields
end
struct O3 <: Observation
    # O3 Observation fields
end

# Main loop over observations
function observation_loop(d::Vector{<:Observation})
    for oᵢ in d
        # Perform some observation-specific calculation
        r = observation_calculation(oᵢ)
        println(r)
    end

    return nothing
end

# Observation-specific calculations
function observation_calculation(O::O1)
    return "O1 Result"
end
function observation_calculation(O::O2)
    return "O2 Result"
end
function observation_calculation(O::O3)
    return "O3 Result"
end

# Some collection of observations
d = [O2(), O2(), O3(), O3(), O1(), O1(), O1()]

@code_warntype observation_loop(d)

The output of from @code_warntype for this example is:


MethodInstance for observation_loop(::Vector{Observation})
  from observation_loop(d::Vector{<:Observation}) in Main at Untitled-1:10
Arguments
  #self#::Core.Const(observation_loop)
  d::Vector{Observation}
Locals
  @_3::Union{Nothing, Tuple{Observation, Int64}}
  oᵢ::Observation
  r::String
Body::Nothing
1 ─ %1  = d::Vector{Observation}
│         (@_3 = Base.iterate(%1))
│   %3  = (@_3 === nothing)::Bool
│   %4  = Base.not_int(%3)::Bool
└──       goto #4 if not %4
2 ┄ %6  = @_3::Tuple{Observation, Int64}
│         (oᵢ = Core.getfield(%6, 1))
│   %8  = Core.getfield(%6, 2)::Int64
│         (r = Main.observation_calculation(oᵢ))
│         Main.println(r)
│         (@_3 = Base.iterate(%1, %8))
│   %12 = (@_3 === nothing)::Bool
│   %13 = Base.not_int(%12)::Bool
└──       goto #4 if not %13
3 ─       goto #2
4 ┄       return Main.nothing

If performance is an issue you could maybe try sorting the observations beforehand although I am unsure that would help.

I did encounter once a similar problem, but constant propagation optimized the code so it was as fast as it can get.

Would you be able to give more details on the precise content of an Observation? Maybe your various structs can be coaxed into a common layout.
Otherwise, function barriers can also be helpful, but I’m not sure whether observation_calculation works as one or not in this case (we’re gonna need typing experts)

SumTypes.jl could help, you’d pack your observation structs into one SumType which you should be able to store contiguously.

Thanks for the link on function barriers.

In short, I’m trying to write a framework for performing linearised inversions that I can easily build on by adding new types of observations into the inversion. So we are just solving Ax = b for some vector of parameters x given some vector of observation residuals b and their sensitivities A to the parameters.

The Observation structures contain relevant information for evaluating the residuals. However, the information required to do so can vary a lot between different types of observations and so finding a common structure is challenging but maybe worth thinking more about. Additionally, I want to dispatch on the observation type. This makes including new observations straight forward; just need to add a new method for computing b and a row of A for the new observation type.

This is an interesting idea that I wasn’t familiar with. Thanks for the suggestion.

If I’m understanding correctly, Virtual.jl should work. Sumtypes.jl or other similar packages should also perform similarly.

Alternatively, if you sort your vector and have separate for loops for the different observations that might help with vectorization.

Thanks! The Virtual.jl package is exactly what I was looking for. The example on the linked GitHub page gets to the heart of my problem–looping over containers with abstract element types.