StructArrays Memory Layout

I’m trying to understand the memory layout of a StructArray I’m constructing in my code. Specifically, it looks to me like the memory layout is still an array of structs, rather than a struct of arrays. I’m very new to Julia, though, so perhaps I’m just misunderstanding the output. Any help clarifying this would be greatly appreciated.

A basic example is (apologies for the verbose type names, they’re relevant to my application):

struct SpinEnvironment{T<:AbstractFloat}
    t1::T
    t2::T
end

struct Environments{T<:AbstractFloat}
    relaxationConstants::StructVector{SpinEnvironment{T}}
    function Environments{T}(e::Vector{SpinEnvironment{T}}) where
        T<:AbstractFloat
        new(StructVector{SpinEnvironment{T}}(
            reinterpret(reshape,T, e), dims=1))
    end
end

se = SpinEnvironment{Float64}(1,2)
env = Environments{Float64}([se,se,se])

The resulting StructArray seems right to me (although the view(reinterpret(...)) seems like a red flag regarding the underlying memory layout)

julia> env.relaxationConstants
3-element StructArray(view(reinterpret(reshape, Float64, ::Vector{Main.SpinStates.SpinEnvironment{Float64}}), 1, :), view(reinterpret(reshape, Float64, ::Vector{Main.SpinStates.SpinEnvironment{Float64}}), 2, :)) with eltype Main.SpinStates.SpinEnvironment{Float64}:
 Main.SpinStates.SpinEnvironment{Float64}(1.0, 2.0)
 Main.SpinStates.SpinEnvironment{Float64}(1.0, 2.0)
 Main.SpinStates.SpinEnvironment{Float64}(1.0, 2.0)

However, when I use reinterpret to try to visualize the memory layout, I see

julia> reinterpret(Float64,env.relaxationConstants)
6-element reinterpret(Float64, StructArray(view(reinterpret(reshape, Float64, ::Vector{Main.SpinStates.SpinEnvironment{Float64}}), 1, :), view(reinterpret(reshape, Float64, ::Vector{Main.SpinStates.SpinEnvironment{Float64}}), 2, :))):
 1.0
 2.0
 1.0
 2.0
 1.0
 2.0

which isn’t what I’d expect. I’d figured I’d see all the 1.0 entries (i.e., all the t1 fields) first, then all the 2.0 entries (i.e., all the t2 fields), since that would be the layout for a struct of arrays. What is the underlying memory layout here?

Thanks!

yeah but you have an array of struts right?

1 Like

I don’t know how to check this in a lower level, but I’m pretty sure that the memory layout it what you expected to be (each field as column-vector contiguous in memory):

julia> struct A
         x::Float64
         y::Float64
       end

julia> v = [ A(rand(),rand()) for i in 1:10000 ];

julia> v2 = StructArray(v);

julia> @btime sum(v.x for v in $v)
  10.029 μs (0 allocations: 0 bytes)
5043.211761518985

julia> @btime sum($v2.x)
  940.097 ns (0 allocations: 0 bytes)
5043.211761518993

julia> y = [ v.x for v in v ];

julia> @btime sum($y)
  947.000 ns (0 allocations: 0 bytes)
5043.211761518993


I think my application/problem might be clearer with this simple example, which I think better illustrates my issue with memory layout. My goal is to multiply each element by a 2x2 matrix, so I do.

julia> struct S
         x::Float64
         y::Float64
       end

julia> v = [ S(1,2) for i in 1:10000 ];

julia> v2 = StructArray(v);

julia> A = reinterpret(reshape,Float64,v2);

julia> B = [[rand(), rand()] [rand(), rand()]]
2×2 Matrix{Float64}:
 0.203275  0.226629
 0.219343  0.670345

julia> C = similar(A);

julia> size(A)
(2, 10000)

julia> @btime mul!(C,B,A);
  159.262 μs (6 allocations: 336 bytes)

This doesn’t seem right, though, as there shouldn’t need to be any allocations. I can confirm that this is way less performant relative to just storing the data in an array with the desired efficient layout

julia> D = rand(Float64, (10000,2));

julia> size(D)
(10000, 2)

julia> E = similar(D);

julia> @btime mul!(E,D,B);
  16.786 μs (0 allocations: 0 bytes)

or even with the inefficient layout

julia> F = rand(Float64, (2,10000));

julia> G = similar(F);

julia> @btime mul!(G,B,F);
  30.861 μs (0 allocations: 0 bytes)

So it seems the obvious difference in performance has to do with my not understanding something about StructArray and reinterpret(reshape,Float64,v2). Any advice on how to make this behave like mul!(E,D,B)?

Thanks again!

1 Like

I’ll just add, I can make the StructArray data perform a little better (and do fewer allocations) if I transpose it

julia> @btime mul!(E,A',B);
  63.075 μs (1 allocation: 32 bytes)

However, I still don’t know why this has to allocate anything.

because you’re not interpolating in the benchmark?

1 Like

Apologies, I’m still new to Julia. Can you point me to a reference to help me understand what that means?

If the expression you want to benchmark depends on external variables, you should use $ to “interpolate” them into the benchmark expression to avoid the problems of benchmarking with globals. Essentially, any interpolated variable $x or expression $(...) is “pre-computed” before benchmarking begins:

2 Likes

Can you confirm I’ve used interpolation properly here? It doesn’t seem to change the overall result (although it reduces allocations in the transpose example, which is helpful). Also, I note that if I just do the operation directly on the original array-of-struct, I get no allocations and the time makes sense to me. There’s still something I’m missing about the StructArray version.

StructArray version

julia> struct S
         x::Float64
         y::Float64
       end

julia> v = [ S(1,2) for i in 1:10000 ];

julia> v2 = StructArray(v);

julia> A = reinterpret(reshape,Float64,v2);

julia> B = [[rand(), rand()] [rand(), rand()]]
2×2 Matrix{Float64}:
 0.056375  0.280542
 0.201907  0.734197

julia> C = similar(A);

julia> size(A)
(2, 10000)

julia> @btime mul!($C,$B,$A);
  153.159 μs (6 allocations: 336 bytes)

Matrix with efficient memory layout (what I’d like to get from StructArray)

julia> D = rand(Float64, (10000,2));

julia> E = similar(D);

julia> @btime mul!($E,$D,$B);
  16.589 μs (0 allocations: 0 bytes)

Matrix with inefficient memory layout (what I’d expect from an array of structs)

julia> F = rand(Float64, (2,10000));

julia> G = similar(F);

julia> @btime mul!($G,$B,$F);
  30.702 μs (0 allocations: 0 bytes)

Confirmation that just using the actually array of structs behaves as I’d expect (no allocations, but less-efficient matrix multiplication due to layout)

julia> H = reinterpret(reshape, Float64, v);

julia> @btime mul!($C, $B, $H);
  30.727 μs (0 allocations: 0 bytes)

I’m still at a loss for what the StructArray version is allocating. Interestingly, when I operate on its transpose (which I guess should be the actual memory layout, given the dimensions), it no longer does any allocations, but it still is 4x slower than I’d expect from my examples above.

julia> @btime mul!($E, $(A'), $B);
  63.556 μs (0 allocations: 0 bytes)
1 Like

I think StructArrays guarantees that each field of the struct will be stored in a contiguous vector. Not that all fields will compose a contiguous matrix in memory.

I may be wrong, but I think you should expect something of this kind:

julia> struct M{T} <: AbstractMatrix{T}
         x::Vector{T}
         y::Vector{T}
       end

julia> Base.getindex(m::M,i,j) = j == 1 ? m.x[i] : m.y[i]

julia> Base.getindex(m::M,i) = i <= length(m.x) ? m.x[i] : m.y[i-length(m.x)]

julia> Base.setindex(m::M,val,i,j) = j == 1 ? m.x[i] = val : m.y[i] = val

julia> Base.length(m::M) = 2*length(m.x)

julia> Base.size(m::M) = (length(m.x),2)

julia> m = M{Float64}([ rand() for i in 1:10000 ],[ rand() for i in 1:10000 ])

julia> @btime mul!($C,$m,$B);
  88.309 μs (6 allocations: 336 bytes)

#versus

julia> A = rand(Float64,10000,2);

julia> @btime mul!($C,$A,$B);
  17.449 μs (0 allocations: 0 bytes)

2 Likes

Interesting! So does this imply I need to roll my own StructArray-ish struct if I want to have the fields packed “end-to-end” as well?

As far as I know, yes. But maybe someone has a better idea. :slight_smile:

1 Like

You may consider declaring your struct a StaticArrays.FieldVector then it will behave both like a struct and a vector, and a vector of those will have the memory layout of a Matrix.

https://juliaarrays.github.io/StaticArrays.jl/stable/pages/api/#FieldVector-1

Edit: when I think about it a bit more, subtyping FieldVector won’t change any memory layout, it’ll be the same as a vector of the original struct definition.

1 Like

So, if I understand correctly, the problem is that you need to use the same data to both be a StructArray{T} and a Matrix{Float64}. You are correct that the method

julia> struct S
         x::Float64
         y::Float64
       end

julia> StructArray(S(rand(),rand()) for i in 1:10000);

will guarantee that each column is stored contiguously, but not that they are next to each other (they actually can’t be in general, because you can push! to each column).

What you could do is the following

julia> struct S
         x::Float64
         y::Float64
       end

julia> data = rand(100, 2); # data with SoA layout

julia> s = StructArray{S}(data, dims=2) # this is just a view and does not allocate memory
100-element StructArray(view(::Matrix{Float64}, :, 1), view(::Matrix{Float64}, :, 2)) with eltype S:
 S(0.6656012326162104, 0.3785828010062877)
 S(0.7640633036875235, 0.7384631357534945)
 ⋮
 S(0.5863124494004175, 0.927251229499223)
 S(0.22170197402130576, 0.8404744601671492)

This is similar to your first attempt, but you need to pass the matrix with the correct memory layout (each row is a struct) to be turned effectively into a StructArray. Otherwise, in your first example, each vector of the StructArray was a non-contiguous view (a row of the original matrix).

5 Likes

This is very helpful! While I can’t do reinterpret(reshape,Float64,s) to recover the dense matrix given just the StructArray, I can at least keep the original dense matrix in the order that’s most helpful to me for my matrix multiplications (i.e., receive an AoS, then use transpose!(X, reinterpret(reshape,...)) to copy the memory into the SoA shape I want). I can then use a StructArray as a view on the SoA to access individual elements with a clear interface.