StructArrays Memory Layout

mdtisdall · July 22, 2021, 1:23pm

I’m trying to understand the memory layout of a StructArray I’m constructing in my code. Specifically, it looks to me like the memory layout is still an array of structs, rather than a struct of arrays. I’m very new to Julia, though, so perhaps I’m just misunderstanding the output. Any help clarifying this would be greatly appreciated.

A basic example is (apologies for the verbose type names, they’re relevant to my application):

struct SpinEnvironment{T<:AbstractFloat}
    t1::T
    t2::T
end

struct Environments{T<:AbstractFloat}
    relaxationConstants::StructVector{SpinEnvironment{T}}
    function Environments{T}(e::Vector{SpinEnvironment{T}}) where
        T<:AbstractFloat
        new(StructVector{SpinEnvironment{T}}(
            reinterpret(reshape,T, e), dims=1))
    end
end

se = SpinEnvironment{Float64}(1,2)
env = Environments{Float64}([se,se,se])

The resulting StructArray seems right to me (although the view(reinterpret(...)) seems like a red flag regarding the underlying memory layout)

julia> env.relaxationConstants
3-element StructArray(view(reinterpret(reshape, Float64, ::Vector{Main.SpinStates.SpinEnvironment{Float64}}), 1, :), view(reinterpret(reshape, Float64, ::Vector{Main.SpinStates.SpinEnvironment{Float64}}), 2, :)) with eltype Main.SpinStates.SpinEnvironment{Float64}:
 Main.SpinStates.SpinEnvironment{Float64}(1.0, 2.0)
 Main.SpinStates.SpinEnvironment{Float64}(1.0, 2.0)
 Main.SpinStates.SpinEnvironment{Float64}(1.0, 2.0)

However, when I use reinterpret to try to visualize the memory layout, I see

julia> reinterpret(Float64,env.relaxationConstants)
6-element reinterpret(Float64, StructArray(view(reinterpret(reshape, Float64, ::Vector{Main.SpinStates.SpinEnvironment{Float64}}), 1, :), view(reinterpret(reshape, Float64, ::Vector{Main.SpinStates.SpinEnvironment{Float64}}), 2, :))):
 1.0
 2.0
 1.0
 2.0
 1.0
 2.0

which isn’t what I’d expect. I’d figured I’d see all the 1.0 entries (i.e., all the t1 fields) first, then all the 2.0 entries (i.e., all the t2 fields), since that would be the layout for a struct of arrays. What is the underlying memory layout here?

Thanks!

jling · July 22, 2021, 1:32pm

yeah but you have an array of struts right?

lmiq · July 22, 2021, 1:49pm

I don’t know how to check this in a lower level, but I’m pretty sure that the memory layout it what you expected to be (each field as column-vector contiguous in memory):

julia> struct A
         x::Float64
         y::Float64
       end

julia> v = [ A(rand(),rand()) for i in 1:10000 ];

julia> v2 = StructArray(v);

julia> @btime sum(v.x for v in $v)
  10.029 μs (0 allocations: 0 bytes)
5043.211761518985

julia> @btime sum($v2.x)
  940.097 ns (0 allocations: 0 bytes)
5043.211761518993

julia> y = [ v.x for v in v ];

julia> @btime sum($y)
  947.000 ns (0 allocations: 0 bytes)
5043.211761518993

mdtisdall · July 22, 2021, 3:28pm

I think my application/problem might be clearer with this simple example, which I think better illustrates my issue with memory layout. My goal is to multiply each element by a 2x2 matrix, so I do.

julia> struct S
         x::Float64
         y::Float64
       end

julia> v = [ S(1,2) for i in 1:10000 ];

julia> v2 = StructArray(v);

julia> A = reinterpret(reshape,Float64,v2);

julia> B = [[rand(), rand()] [rand(), rand()]]
2×2 Matrix{Float64}:
 0.203275  0.226629
 0.219343  0.670345

julia> C = similar(A);

julia> size(A)
(2, 10000)

julia> @btime mul!(C,B,A);
  159.262 μs (6 allocations: 336 bytes)

This doesn’t seem right, though, as there shouldn’t need to be any allocations. I can confirm that this is way less performant relative to just storing the data in an array with the desired efficient layout

julia> D = rand(Float64, (10000,2));

julia> size(D)
(10000, 2)

julia> E = similar(D);

julia> @btime mul!(E,D,B);
  16.786 μs (0 allocations: 0 bytes)

or even with the inefficient layout

julia> F = rand(Float64, (2,10000));

julia> G = similar(F);

julia> @btime mul!(G,B,F);
  30.861 μs (0 allocations: 0 bytes)

So it seems the obvious difference in performance has to do with my not understanding something about StructArray and reinterpret(reshape,Float64,v2). Any advice on how to make this behave like mul!(E,D,B)?

Thanks again!

mdtisdall · July 22, 2021, 3:34pm

I’ll just add, I can make the StructArray data perform a little better (and do fewer allocations) if I transpose it

julia> @btime mul!(E,A',B);
  63.075 μs (1 allocation: 32 bytes)

However, I still don’t know why this has to allocate anything.

jling · July 22, 2021, 3:36pm

because you’re not interpolating in the benchmark?

mdtisdall · July 22, 2021, 3:38pm

Apologies, I’m still new to Julia. Can you point me to a reference to help me understand what that means?

jling · July 22, 2021, 3:41pm

github.com

JuliaCI/BenchmarkTools.jl/blob/master/README.md#quick-start

# BenchmarkTools.jl

<picture>
<source media="(prefers-color-scheme: dark)" srcset="docs/src/assets/logo-dark.svg" width="100" height="100" align="right">
<img alt="BenchmarkTools logo" src="docs/src/assets/logo.svg" width="100" height="100" align="right">
</picture>

[![][docs-stable-img]][docs-stable-url]
[![][docs-dev-img]][docs-dev-url]
[![Build Status](https://github.com/JuliaCI/BenchmarkTools.jl/workflows/CI/badge.svg)](https://github.com/JuliaCI/BenchmarkTools.jl/actions/workflows/CI.yml?query=branch%3Amaster)
[![Code Coverage](https://codecov.io/gh/JuliaCI/BenchmarkTools.jl/branch/master/graph/badge.svg?label=codecov&token=ccN7NZpkBx)](https://codecov.io/gh/JuliaCI/BenchmarkTools.jl)
[![Code Style: Blue](https://img.shields.io/badge/code%20style-blue-4495d1.svg)](https://github.com/invenia/BlueStyle)
[![Aqua QA](https://raw.githubusercontent.com/JuliaTesting/Aqua.jl/master/badge.svg)](https://github.com/JuliaTesting/Aqua.jl)

BenchmarkTools makes **performance tracking of Julia code easy** by supplying a framework for **writing and running groups of benchmarks** as well as **comparing benchmark results**.

This package is used to write and run the benchmarks found in [BaseBenchmarks.jl](https://github.com/JuliaCI/BaseBenchmarks.jl).

The CI infrastructure for automated performance testing of the Julia language is not in this package, but can be found in [Nanosoldier.jl](https://github.com/JuliaCI/Nanosoldier.jl).

This file has been truncated. show original

If the expression you want to benchmark depends on external variables, you should use $ to “interpolate” them into the benchmark expression to avoid the problems of benchmarking with globals. Essentially, any interpolated variable $x or expression $(...) is “pre-computed” before benchmarking begins:

mdtisdall · July 22, 2021, 3:54pm

Can you confirm I’ve used interpolation properly here? It doesn’t seem to change the overall result (although it reduces allocations in the transpose example, which is helpful). Also, I note that if I just do the operation directly on the original array-of-struct, I get no allocations and the time makes sense to me. There’s still something I’m missing about the StructArray version.

StructArray version

julia> struct S
         x::Float64
         y::Float64
       end

julia> v = [ S(1,2) for i in 1:10000 ];

julia> v2 = StructArray(v);

julia> A = reinterpret(reshape,Float64,v2);

julia> B = [[rand(), rand()] [rand(), rand()]]
2×2 Matrix{Float64}:
 0.056375  0.280542
 0.201907  0.734197

julia> C = similar(A);

julia> size(A)
(2, 10000)

julia> @btime mul!($C,$B,$A);
  153.159 μs (6 allocations: 336 bytes)

Matrix with efficient memory layout (what I’d like to get from StructArray)

julia> D = rand(Float64, (10000,2));

julia> E = similar(D);

julia> @btime mul!($E,$D,$B);
  16.589 μs (0 allocations: 0 bytes)

Matrix with inefficient memory layout (what I’d expect from an array of structs)

julia> F = rand(Float64, (2,10000));

julia> G = similar(F);

julia> @btime mul!($G,$B,$F);
  30.702 μs (0 allocations: 0 bytes)

Confirmation that just using the actually array of structs behaves as I’d expect (no allocations, but less-efficient matrix multiplication due to layout)

julia> H = reinterpret(reshape, Float64, v);

julia> @btime mul!($C, $B, $H);
  30.727 μs (0 allocations: 0 bytes)

I’m still at a loss for what the StructArray version is allocating. Interestingly, when I operate on its transpose (which I guess should be the actual memory layout, given the dimensions), it no longer does any allocations, but it still is 4x slower than I’d expect from my examples above.

julia> @btime mul!($E, $(A'), $B);
  63.556 μs (0 allocations: 0 bytes)

lmiq · July 22, 2021, 6:49pm

I think StructArrays guarantees that each field of the struct will be stored in a contiguous vector. Not that all fields will compose a contiguous matrix in memory.

I may be wrong, but I think you should expect something of this kind:

julia> struct M{T} <: AbstractMatrix{T}
         x::Vector{T}
         y::Vector{T}
       end

julia> Base.getindex(m::M,i,j) = j == 1 ? m.x[i] : m.y[i]

julia> Base.getindex(m::M,i) = i <= length(m.x) ? m.x[i] : m.y[i-length(m.x)]

julia> Base.setindex(m::M,val,i,j) = j == 1 ? m.x[i] = val : m.y[i] = val

julia> Base.length(m::M) = 2*length(m.x)

julia> Base.size(m::M) = (length(m.x),2)

julia> m = M{Float64}([ rand() for i in 1:10000 ],[ rand() for i in 1:10000 ])

julia> @btime mul!($C,$m,$B);
  88.309 μs (6 allocations: 336 bytes)

#versus

julia> A = rand(Float64,10000,2);

julia> @btime mul!($C,$A,$B);
  17.449 μs (0 allocations: 0 bytes)

mdtisdall · July 22, 2021, 7:20pm

Interesting! So does this imply I need to roll my own StructArray-ish struct if I want to have the fields packed “end-to-end” as well?

lmiq · July 22, 2021, 7:24pm

As far as I know, yes. But maybe someone has a better idea.

baggepinnen · July 22, 2021, 7:39pm

You may consider declaring your struct a StaticArrays.FieldVector then it will behave both like a struct and a vector, and a vector of those will have the memory layout of a Matrix.

https://juliaarrays.github.io/StaticArrays.jl/stable/pages/api/#FieldVector-1

Edit: when I think about it a bit more, subtyping FieldVector won’t change any memory layout, it’ll be the same as a vector of the original struct definition.

piever · July 26, 2021, 8:40am

So, if I understand correctly, the problem is that you need to use the same data to both be a StructArray{T} and a Matrix{Float64}. You are correct that the method

julia> struct S
         x::Float64
         y::Float64
       end

julia> StructArray(S(rand(),rand()) for i in 1:10000);

will guarantee that each column is stored contiguously, but not that they are next to each other (they actually can’t be in general, because you can push! to each column).

What you could do is the following

julia> struct S
         x::Float64
         y::Float64
       end

julia> data = rand(100, 2); # data with SoA layout

julia> s = StructArray{S}(data, dims=2) # this is just a view and does not allocate memory
100-element StructArray(view(::Matrix{Float64}, :, 1), view(::Matrix{Float64}, :, 2)) with eltype S:
 S(0.6656012326162104, 0.3785828010062877)
 S(0.7640633036875235, 0.7384631357534945)
 ⋮
 S(0.5863124494004175, 0.927251229499223)
 S(0.22170197402130576, 0.8404744601671492)

This is similar to your first attempt, but you need to pass the matrix with the correct memory layout (each row is a struct) to be turned effectively into a StructArray. Otherwise, in your first example, each vector of the StructArray was a non-contiguous view (a row of the original matrix).

mdtisdall · July 29, 2021, 6:22pm

This is very helpful! While I can’t do reinterpret(reshape,Float64,s) to recover the dense matrix given just the StructArray, I can at least keep the original dense matrix in the order that’s most helpful to me for my matrix multiplications (i.e., receive an AoS, then use transpose!(X, reinterpret(reshape,...)) to copy the memory into the SoA shape I want). I can then use a StructArray as a view on the SoA to access individual elements with a clear interface.

Topic		Replies	Views
Julia style? Arrays/matrix vs structs General Usage	19	3882	December 18, 2019
Understanding how an array of structures works General Usage question , arrays , struct	31	6041	October 31, 2021
Understanding and avoiding allocations with StructArrays Performance structarrays	16	1435	September 25, 2020
New to Julia and Formatting StructArrays for GPU use with mutable scalars GPU staticarrays , structarrays	2	608	February 13, 2023
Difference in allocations when constructing array-of-arrays vs array-of-structs General Usage	5	927	February 25, 2018

StructArrays Memory Layout

Related topics