Storing collections of heterogeneous data

bvanderbeek · May 20, 2023, 11:02am

I’m writing some Julia code in which I need to loop over a collection of observations. The observations are not all of the same type and each type of observation is defined in a specific structure. I need to then perform a number of observation-specific calculations (these calculations do all return the same type). However, the loop over the observations is not type stable due to the fact we are indexing into a heterogeneous collection. Below is some example code illustrating the basic idea.

I’m wondering if others have encountered similar issues and found a better way to organise collections of heterogeneous data that can be looped over in a type-stable way.


# Some observation structures
abstract type Observation end
struct O1 <: Observation
    # O1 Observation fields
end
struct O2 <: Observation
    # O2 Observation fields
end
struct O3 <: Observation
    # O3 Observation fields
end

# Main loop over observations
function observation_loop(d::Vector{<:Observation})
    for oᵢ in d
        # Perform some observation-specific calculation
        r = observation_calculation(oᵢ)
        println(r)
    end

    return nothing
end

# Observation-specific calculations
function observation_calculation(O::O1)
    return "O1 Result"
end
function observation_calculation(O::O2)
    return "O2 Result"
end
function observation_calculation(O::O3)
    return "O3 Result"
end

# Some collection of observations
d = [O2(), O2(), O3(), O3(), O1(), O1(), O1()]

@code_warntype observation_loop(d)

The output of from @code_warntype for this example is:


MethodInstance for observation_loop(::Vector{Observation})
  from observation_loop(d::Vector{<:Observation}) in Main at Untitled-1:10
Arguments
  #self#::Core.Const(observation_loop)
  d::Vector{Observation}
Locals
  @_3::Union{Nothing, Tuple{Observation, Int64}}
  oᵢ::Observation
  r::String
Body::Nothing
1 ─ %1  = d::Vector{Observation}
│         (@_3 = Base.iterate(%1))
│   %3  = (@_3 === nothing)::Bool
│   %4  = Base.not_int(%3)::Bool
└──       goto #4 if not %4
2 ┄ %6  = @_3::Tuple{Observation, Int64}
│         (oᵢ = Core.getfield(%6, 1))
│   %8  = Core.getfield(%6, 2)::Int64
│         (r = Main.observation_calculation(oᵢ))
│         Main.println(r)
│         (@_3 = Base.iterate(%1, %8))
│   %12 = (@_3 === nothing)::Bool
│   %13 = Base.not_int(%12)::Bool
└──       goto #4 if not %13
3 ─       goto #2
4 ┄       return Main.nothing

devetak-m · May 20, 2023, 11:37am

If performance is an issue you could maybe try sorting the observations beforehand although I am unsure that would help.

I did encounter once a similar problem, but constant propagation optimized the code so it was as fast as it can get.

gdalle · May 20, 2023, 12:12pm

Would you be able to give more details on the precise content of an Observation? Maybe your various structs can be coaxed into a common layout.
Otherwise, function barriers can also be helpful, but I’m not sure whether observation_calculation works as one or not in this case (we’re gonna need typing experts)

jules · May 20, 2023, 12:45pm

SumTypes.jl could help, you’d pack your observation structs into one SumType which you should be able to store contiguously.

bvanderbeek · May 20, 2023, 12:56pm

Thanks for the link on function barriers.

In short, I’m trying to write a framework for performing linearised inversions that I can easily build on by adding new types of observations into the inversion. So we are just solving Ax = b for some vector of parameters x given some vector of observation residuals b and their sensitivities A to the parameters.

The Observation structures contain relevant information for evaluating the residuals. However, the information required to do so can vary a lot between different types of observations and so finding a common structure is challenging but maybe worth thinking more about. Additionally, I want to dispatch on the observation type. This makes including new observations straight forward; just need to add a new method for computing b and a row of A for the new observation type.

bvanderbeek · May 20, 2023, 1:02pm

This is an interesting idea that I wasn’t familiar with. Thanks for the suggestion.

Zentrik · May 20, 2023, 1:29pm

If I’m understanding correctly, Virtual.jl should work. Sumtypes.jl or other similar packages should also perform similarly.

Alternatively, if you sort your vector and have separate for loops for the different observations that might help with vectorization.

bvanderbeek · May 20, 2023, 9:55pm

Thanks! The Virtual.jl package is exactly what I was looking for. The example on the linked GitHub page gets to the heart of my problem–looping over containers with abstract element types.

Topic		Replies	Views
Type stability and heterogeneous collections Performance question	5	236	March 29, 2024
Iterating structs of different types efficiently General Usage type-stability	15	713	August 4, 2023
Type stability when looping over fields in a heterogenous data struct New to Julia data_structures , type-stability	8	465	July 17, 2023
Type stable accumulator over heterogeneous collection General Usage question	3	615	August 19, 2017
Looping over different types with common behavior Performance	9	1045	June 30, 2018

Storing collections of heterogeneous data

Related topics