Iterating structs of different types efficiently

adrian · July 31, 2023, 5:00pm

I am writing code that will involve solving an ODE that is composed of heterogeneous components. This requires evaluating the right hand side by iterating over those heterogeneous components.

For this, I use a struct Device which is composed of a DynamicDevice type. This DynamicDevice type is an abstract type that could be TypeA, TypeB, etc. Then, I will use multiple dispatch to write a right hand side function evaluation for each component type.

If I create a generic Vector that holds a bunch of Devices, however, this leads to type instability and memory allocations. In my mind, the Device’s should look the same in memory, it is only the subfield Device.type that has different type.

Is there any way/pattern to write this to avoid type instability yet being able to store a collection of heterogeneous structs?

The code below has an example to reproduce what I describe. I benchmark with BenchmarkTools and I have denoted each of these benchmarks with P1, P2, etc.

P1 does not lead to type instability since devicesA is a

5-element Vector{Device{TypeA}}

I guess the compiler sees that in this case all the elements are of the same type and thus optimizes the structure? In P3 however, devicesB is

5-element Vector{Device}:

and this leads to impaired performance/memory issues.

Also, why do P2 and P4 allocate memory?

using BenchmarkTools

N = 5

abstract type DynamicDevice end

struct TypeA <: DynamicDevice
    data::Float64
end

struct Device{D <: DynamicDevice}
    type::D
    ptr::Int
end

# pointers system
struct System
    device_type::Vector{Device}
end

# residual function
function f!(x, device_type::TypeA, ptr)
    x[ptr] = 1.0
end

function f!(x, devices)
    for i in 1:length(devices)
        device = devices[i]
        f!(x, device.type, device.ptr)
    end
end

function f!(x, system::System)
    f!(x, system.device_type)
end

x = zeros(Float64, N)

devicesA = [Device(TypeA(1.0), 1)  for _ in 1:N]
# P1
@btime f!(x, devicesA)
dev = TypeA(1.0)
# P2
@btime f!(x, dev, 1)

devicesB = Vector{Device}(undef, N)
for i in 1:N
    devicesB[i] = Device(TypeA(1.0), 1)
end
system = System(devicesB)
#P3
@btime f!(x, system)

dev = TypeA(1.0)
#P4
@btime f!(x, dev, 1)

algunion · July 31, 2023, 9:33pm

The issue in your original MWE relates to how you declare global variables. You can find more details here.

Here is a version where P2 and P4 are not allocating:

using BenchmarkTools

N = 5

abstract type DynamicDevice end

struct TypeA <: DynamicDevice
    data::Float64
end

struct Device{D<:DynamicDevice}
    type::D
    ptr::Int
end

# pointers system
struct System
    device_type::Vector{Device}
end

# residual function
function f!(f, device_type::TypeA, ptr)
    f[ptr] = 1.0
end

function f!(x, devices)
    for i in eachindex(devices)
        device = devices[i]
        f!(x, device.type, device.ptr)
    end
end

function f!(x, system::System)
    f!(x, system.device_type)
end

const x = zeros(Float64, N)

devicesA = [Device(TypeA(1.0), 1) for _ in 1:N]
# P1
@btime f!(x, devicesA)
const dev = TypeA(1.0)
# P2
@btime f!(x, dev, 1)

const devicesB = Vector{Device}(undef, N)
for i in 1:N
    devicesB[i] = Device(TypeA(1.0), 1)
end
const system::System = System(devicesB)
#P3
@btime f!(x, system)

const dev2 = TypeA(1.0)
#P4
@btime f!(x, dev2, 1)

gdalle · July 31, 2023, 10:28pm

And for your main question, I’m not exactly sure what this package does but it seems SumTypes.jl could solve your problem

adrian · August 1, 2023, 1:49am

Thank you for your reply. Indeed, this was related to the global variables. Just by making x const

const x = zeros(Float64, N)

seems to address the issue of memory allocation in P2 and P4.

But still do not understand why this happens. The section of the manual you linked makes reference to untyped global variables, but this one is typed?

adrian · August 1, 2023, 1:56am

@gdalle thank you for your reply!

Unfortunately, I am not sure SumTypes is what I am looking for. In my case, I might have hundreds of DynamicDevice types.

I know how to solve this problem in C (using function pointers, for example), or Python (OOP, methods). I wonder if there’s a “Julian” way to write this.

algunion · August 1, 2023, 2:10am

Your initial declaration looked like this x = zeros(Float64, N) - this is neither constant nor a global annotated variable (non-annotated global variable).

That means you could do this at a later time (still in global scope): x = "unstable, I am a String now".

Now, to achieve type stability in the current context (e.g., x = zeros(Float64, N)), you have two options:

use a constant, const x = zeros(Float64, N) - you will not gain much from also annotating this because the compiler can infer the type, and it is also clear the type cannot change at a later time (e.g., constant)
however, you might still have some use cases where a constant will not be appropriate. Here we are in the territory of annotated global variables, meaning you might change the variable content but keep the same type (also, using abstract type when annotating things will not provide much performance gain).

Both the above scenarios are going to ensure type stability (in your case, you solved the type stability issue by const route).

Is the above explanation making things more clear for you?

adrian · August 1, 2023, 2:35am

Thanks, I think I start to see it. This link, in the section you sent me, also clarified things

https://docs.julialang.org/en/v1/manual/variables-and-scoping/#man-typed-globals

So, in my case, I can do

x::Vector{Float64} = zeros(Float64, N)

To fix the memory issue.

Would this be equivalent to the case in C in which I can declare x to be a pointer to an array of Floats or x to be a void pointer?

algunion · August 1, 2023, 3:18am

If you want the Device to look the same in memory, then you could drop the parametric side of it. This will both reduce the allocations and improve performance in your particular scenario:

struct Device
    type::DynamicDevice
    ptr::Int
end

After dropping the parametric:

22.506 ns (0 allocations: 0 bytes)
2.461 ns (0 allocations: 0 bytes)
29.928 ns (1 allocation: 16 bytes) # P3 after
2.528 ns (0 allocations: 0 bytes)

Before:

16.146 ns (0 allocations: 0 bytes)
2.521 ns (0 allocations: 0 bytes)
245.622 ns (6 allocations: 96 bytes) # P3 before
2.522 ns (0 allocations: 0 bytes)

Since you are dispatching on concrete subtypes of DynamicDevice, having a parametric Device struct doesn’t seem necessary.

At this point we still have an issue with having an abstract field, but still an improvement.

Benny · August 1, 2023, 4:08am

The parametric struct was definitely the primary cause of type instability. Making a new Device subtype for every DynamicDevice subtype means that methods have to be compiled separately for each Device subtype, so when you have a heterogeneous vector, you have to figure out what compiled method to use at runtime from each element’s type, think vtables in C++ except it’s more complicated due to multiple dispatch.

Still, it’s not a perfect situation because type::DynamicDevice is an abstractly annotated field, whose requires a reference and type check on accesses, in other words type instability.

Then it wouldn’t look the same to the compiler. You might mean that the bits themselves are laid out the same way, for example if TypeA, TypeB, and all subtypes of DynamicDevice had the same fields. You can’t guarantee that in the language. Even if you could, the methods would still be compiled separately, so the runtime dispatch still needs to happen. If all the types have the exact same data::Float64 in it, maybe you don’t need more than 1 type to begin with.

Also the problem with the global variable wasn’t that it was unannotated. It’d be bad if it was just being accessed in methods, but here you’re passing it in as an argument so it should still dispatch on the instance’s type properly. The issue was that it was not interpolated into @btime:

julia> @btime f!(x, dev, 1)
  19.256 ns (1 allocation: 16 bytes)
1.0

julia> @btime f!($x, $dev, $1)
  3.399 ns (0 allocations: 0 bytes)
1.0

I’m oversimplifying, but think of @btime as putting the call into a benchmark function with a for loop. The first version has no arguments and acts much like accessing global variables. The second version has 3 arguments that get dispatched. Interpolation does go a bit deeper than just controlling the arguments to the benchmark function, and the docs put it better. But note in the examples how the only “arguments” that don’t get interpolated are literal integers and such.

adrian · August 1, 2023, 4:54am

Thank you very much, @algunion.

I am still concerned about the allocations and performance since this code would evaluate a right hand side inside a time loop (potentially iterating thousands of elements each time step, for a large quantity of time steps).

EDIT: Actually, I tried to do what you mention and it resulted in 0 allocations (!!!).

  2.432 μs (0 allocations: 0 bytes)
  6.583 ns (0 allocations: 0 bytes)
  2.410 μs (0 allocations: 0 bytes)
  6.583 ns (0 allocations: 0 bytes)

Why do you think yours is allocating?

adrian · August 1, 2023, 5:24am

Thank you very much, @Benny . This is very illuminating.

It seems that having a parametric struct was not a good idea in this case. I still struggle with understanding when the compiler will have type instability issues and will led to memory performance issues.

Benny · August 1, 2023, 6:34am

Type instability is when a variable or expression cannot be inferred to have a specific concrete type. Sometimes the compiler can tell when it’s only 3-4 concrete types, then optimize the runtime dispatch to a faster type-checking branch. But here you made a devicesB = Vector{Device}(undef, N), a vector whose element type is an abstract Device. It can hold an unbounded variety of concrete types, so you must check the concrete type of each element devicesB[i] at runtime.

Even if you know that all the concrete devices have the same structure, the abstract Device and DynamicDevice does not and cannot have that information. As I said earlier, the way to do that would be 1 concrete type. If all the DynamicDevice subtypes are just structs wrapping a Float64 and are intended for the same methods, I suggest just making 1 concrete type for it, which makes all your type instability problems here go away. Of course this might just be a MWE and your actual code needs separate concrete types with different methods, in which case you really do need runtime dispatch over heterogenous types.

mrufsvold · August 1, 2023, 9:17am

I don’t know all that SumTypes does but the essence of its optimization (if I’m understanding correctly, @Mason ) is this:

@sum_type DynamicDevice begin
    Device1(a::Float64)
    Device2(b::Float64)
end

@cases device_instance begin
    Device1(x) => f(x)
    Device2 => println("got device2 type")
end

Is transformed to

struct DynamicType
    _type::UInt8
    a:: Union{Float64,Nothing}
    b::Union{Float64,Nothing}
end

if device_instance._type == 1
    f(device.a)
elseif device_instance._type == 2
    println("got device2 type")
end

Basically, it is just a smart way of putting all the fields into a single struct, and then using a regular if check to dispatch, instead of the multiple dispatch machinery. This means iterating over a vector only has to compile a single function for all elements even if they are “different” types.

But you could do the same thing manually.

@enum DeviceType begin
    Device1
    Device2
end

struct DynamicDevice
    type::DeviceType
    a::Union{Float64, Nothing}
    b::Union{Int64, Nothing}
    # more fields if needed
end

function dispatch(d, args...)
    T == d.type
    if T == Device1
        #maybe device1 uses both fields
        f1(d.a, d.b, args...)
    elseif T == Device2
        f2(d.b, args...)
    end
end

PS, this is exactly why Julia should get first class support for a compacting Sum Type. It stinks to have to do this unrolling manually. And we could do a smarter jump table type solution if the compiler understood what we were trying to do.

lmiq · August 1, 2023, 10:12am

There are some discussions about that here in the forum. One possible solution is to use a macro to do the splitting:

(that is marked in other threads discussing the topic, and you can see then the other discussions)

algunion · August 1, 2023, 11:11am

Maybe the MWE branched at some point, and we no longer use the same code.

The MWE on my end.

using BenchmarkTools

N = 5

abstract type DynamicDevice end

struct TypeA <: DynamicDevice
    data::Float64
end

struct Device
    type::DynamicDevice
    ptr::Int
end

# pointers system
struct System
    device_type::Vector{Device}
end

# residual function
function f!(f, device_type::TypeA, ptr)
    f[ptr] = 1.0
end

function f!(x, devices)
    for i in eachindex(devices)
        device = devices[i]
        f!(x, device.type, device.ptr)
    end
end

function f!(x, system::System)
    f!(x, system.device_type)
end

const x = zeros(Float64, N)

devicesA = [Device(TypeA(1.0), 1) for _ in 1:N]
# P1
@btime f!(x, devicesA)
const dev = TypeA(1.0)
# P2
@btime f!(x, dev, 1)

const devicesB = Vector{Device}(undef, N)
for i in 1:N
    devicesB[i] = Device(TypeA(1.0), 1)
end
const system::System = System(devicesB)
#P3
@btime f!(x, system)

const dev2 = TypeA(1.0)
#P4
@btime f!(x, dev2, 1)

jdlara-berkeley · August 4, 2023, 5:02am

We have a similar situation in PowerSimulationsDynamics.jl and have thought about reimplementing the code in this package GitHub - tkoolen/TypeSortedCollections.jl: Type-stable operations on type-heterogeneous collections

Topic		Replies	Views
Avoiding Vectors of Abstract Types Performance question , type-stability	22	4438	February 17, 2022
How to maintain performance despite type instability when you know the type? General Usage question	5	944	August 3, 2017
Improving performance elegantly - type stability General Usage type-stability	20	1407	January 13, 2019
Type stability and heterogeneous collections Performance question	5	239	March 29, 2024
Performance problems when dealing with deliberately type-unstable code. How to use type knowledge better? Performance	33	2073	December 30, 2020

Iterating structs of different types efficiently

Related topics