Julia style? Arrays/matrix vs structs

Suppose I want to write a simulation of a bunch of particles interacting. I could for example represent them in an array of say 3x1000 floats for position and another for velocity. Maybe a third array would have some identity type information, color or mass or chemical information… whatever.

Or, I could represent the whole thing as an array of structs, with each struct having say position, velocity, and type information in it.

Which would you consider to be a more standard representation in Julia? Or is there an alternative? What is a typical idiom?

1 Like

Recently I have been thinking about the same question and came to the conclusion that I want to work with point-cloud based representations. So data would be encoded in indexed point data, which would be an array of structs (specifically Grassmann.jl numbers), and then I will assign to each point a tensor field on which to hold the localized velocity/pressure/etc data. However, I am still refining my representation design.

See Struct of Arrays (SoA) vs Array of Structs (AoS) - #7 by kristoffer.carlsson

4 Likes

Think about if you want to dispatch on single instances of your particle, if you do, then a struct is a reasonable thing to have for dispatch purposes. For performance, you may want to think about how you will operate on the data. If you operate over all x-coordinates at the same time, a struct-of-arrays layout might be more performant than array-of-structs.

You may also want to consider FieldVector from StaticArrays.jl. This will give you a struct with fields that also acts as a vector. This type will not handle things like strings of metadata though.

4 Likes

Thanks for that pointer to previous topic, I had sort of assumed the array of numbers representation would be better than array of structs. it seems to me the best way would be a struct with position as a 3xN matrix and velocity as a 3xN matrix. then I can operate on points as a slice pos[:,15] for example. remembering column major order.

I’ll start with that and see how things go.

I would use a Vector of length 3 static arrays instead of a Matrix

julia> using StaticArrays

julia> a = [rand(SVector{3}), rand(SVector{3})]
2-element Array{SArray{Tuple{3},Float64,1,3},1}:
 [0.48097026314303837, 0.252024902663313, 0.4889780702896247]
 [0.09840549539709653, 0.9609762230547614, 0.018091135766256095]

julia> a[1]
3-element SArray{Tuple{3},Float64,1,3} with indices SOneTo(3):
 0.48097026314303837
 0.252024902663313
 0.4889780702896247

since that way, the length of the elements is statically known and you don’t need to worry about allocations when doing e.g. a[:, 15] (which allocates a brand new array).

8 Likes

Ok, so I guess I’d use MVector as I plan to mutate the values in place.

I am planning to iterate through the particles and group them together when they are close in space. I was thinking of selecting a random set of indexes as the “centers”, and using an array of Dequeues to push the indexes of particles into the dequeues based on which first-element is the nearest neighbor. I’d then iterate through the dequeues doing operations on each “clump” of particles.

Any thoughts on how that might alter the choices?

Maybe, or you just create new SVectors in place. Don’t underestimate how good the compiler is when dealing with small immutable data.

In GitHub - KristofferC/NearestNeighbors.jl: High performance nearest neighbor data structures and algorithms for Julia. (which does kinda the same thing you describe), the first thing I do when I get the data as a matrix is to transform it into a Vector of Static Arrays and that has worked well so far.

2 Likes

Depending on the details of your computation, it may be more efficient to just create a new SVector with the updated values instead of mutating (to avoid the overhead associated with heap-allocated objects). (ref.)

using StaticArrays

struct Point3D{T} <: FieldVector{3, T}
    x::T
    y::T
    z::T
end

mutable struct MPoint3D{T} <: FieldVector{3, T}
    x::T
    y::T
    z::T
end

function f1!(v)
    for i in eachindex(v)
        x, y, z = v[i]
        v[i] = Point3D(2x, x*y, x*z)
    end
end

function f2!(v)
    for i in eachindex(v)
        x, y, z = v[i]
        v[i] .= (2x, x*y, x*z)
    end
end
using BenchmarkTools

v1 = [Point3D(rand(3))  for i = 1:1000]
v2 = [MPoint3D(rand(3)) for i = 1:1000]

@btime f1!($v1) # 854.769 ns (0 allocations: 0 bytes)
@btime f2!($v2) # 60.164 μs (3000 allocations: 46.88 KiB)

Note that you timing is not really representative since the broadcasting doesn’t seem to be very efficient, cf:

julia> function f3!(v)
           for i in eachindex(v)
               x, y, z = v[i]
               vv = v[i]
               vv.x = 2x
               vv.y = x*y
               vv.z = x*z
           end
       end
f3! (generic function with 1 method)

julia> @btime f3!($v2)
  1.023 μs (0 allocations: 0 bytes)
1 Like

Good catch! It’s surprising that broadcasting imposes such an enormous overhead on a fixed-size array.

Perhaps worth an issue to StaticArrays.jl

Done: https://github.com/JuliaArrays/StaticArrays.jl/issues/699
Seems to only be an issue for mutable FieldVectors, not MVectors.

1 Like

Let’s make things a little bit more interesting by throwing in regular arrays
and increasing the size of the data structure.

using StaticArrays

struct Point3D{T} <: FieldVector{3, T}
    x::T
    y::T
    z::T
end

mutable struct MPoint3D{T} <: FieldVector{3, T}
    x::T
    y::T
    z::T
end

function f1!(v)
    for i in eachindex(v)
        x, y, z = v[i]
        v[i] = Point3D(2x, x*y, x*z)
    end
end

function f2!(v)
    for i in eachindex(v)
        x, y, z = v[i]
        v[i] .= (2x, x*y, x*z)
    end
end

function f5!(v)
    for i in 1:size(v, 1)
        x, y, z = v[i, 1], v[i, 2], v[i, 3]
        v[i, 1], v[i, 2], v[i, 3] = 2x, x*y, x*z
    end
end

function f6!(v)
    for i in 1:size(v, 2)
        x, y, z = v[1, i], v[2, i], v[3, i]
        v[1, i], v[2, i], v[3, i] = 2x, x*y, x*z
    end
end

using BenchmarkTools
v1 = [@SVector(rand(3)) for i = 1:1000000]
v2 = [@MVector(rand(3)) for i = 1:1000000]
v3 = [Point3D(rand(3))  for i = 1:1000000]
v4 = [MPoint3D(rand(3)) for i = 1:1000000]
v5 = rand(1000000, 3)
v6 = rand(3, 1000000)

@btime f1!($v1)   # 1.693 ms (0 allocations: 0 bytes)  
@btime f2!($v2)  # 2.821 ms (0 allocations: 0 bytes)  

@btime f1!($v3)  # 1.728 ms (0 allocations: 0 bytes)     
@btime f2!($v4)  # 57.962 ms (3000000 allocations: 45.78 MiB)   

@btime f5!($v5)  # 1.518 ms (0 allocations: 0 bytes)  
@btime f6!($v6)    # 1.739 ms (0 allocations: 0 bytes)

HybridArrays.jl provides a more convenient and powerful interface for this exact pattern and should be generally as fast as arrays of SArrays where both can be used.

2 Likes

I’d recommend a struct of arrays for that use case. Being lazy and just using an Array (pretend column 1 is x, column 2 is y, and column 3 is z):

function f4!(A)
    @inbounds @simd for i in 1:size(A,1)
        x = A[i,1]; y = A[i,2]; z = A[i,3]
        A[i,1] = 2x; A[i,2] = x*y; A[i,3] = x*z
    end
end

Benchmarking:

julia> A = rand(1000,3);

julia> @btime f1!($v1)
   931.406 ns (0 allocations: 0 bytes)

julia> @btime f2!($v2)
   58.993 μs (3000 allocations: 46.88 KiB)

julia> @btime f3!($v2)
   1.489 μs (0 allocations: 0 bytes)

julia> @btime f4!($A)
   315.979 ns (0 allocations: 0 bytes)  

julia> versioninfo()
Julia Version 1.3.1-pre.0
Commit b42f4ab* (2019-11-26 17:58 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-6.0.1 (ORCJIT, haswell)
Environment:
  JULIA_NUM_THREADS = 24

Disappointing that StructArrays doesn’t seem to deliver an equal speedup:

v5 = StructArray{Point3D}((rand(1000), rand(1000), rand(1000)))
@btime f1!($v5) # 1.833 μs (0 allocations: 0 bytes)

**edit: with @inbounds @simd, it does:

function f5!(v)
    @inbounds @simd for i in eachindex(v)
        x, y, z = v[i]
        v[i] = Point3D(2x, x*y, x*z)
    end
end

v5 = StructArray{Point3D}((rand(1000), rand(1000), rand(1000)))
@btime f1!($v5) # 1.833 μs (0 allocations: 0 bytes)
@btime f5!($v5) # 242.279 ns (0 allocations: 0 bytes)
1 Like

The fact that f5!($v5) is faster than f6!($v6) is strange to me… shouldn’t accessing a column be more cache coherent than accessing a row? or is this just within the timing noise?

But it confirms for me that just using regular arrays and mutating in place is fast, basically as fast as any of the more specialized stuff.

Yes, that is intriguing. I don’t know what the answer is.
I have observed this in other cases as well: I hope you like brainteasers

Yeah, but when you want to start doing operations on extracted velocities is when StaticArrays shine. It’s important to measure something that is actually representative for your workload.

5 Likes