Best way to store data for GPU use


Hello, I am writing code in Julia for molecular simulation. Currently it is only for CPU’s, but I would like to convert it to GPU’s once I have it running and tested.

I am forever in a quandary about how to store data (mutable) i.e. each particle has a position, velocity, and its type that must be stored. position and velocities are updated every step, so they are mutable.

I really like Array of Structs, they are easy to code with, but after watching Tim’s Juliacon video and doing some reading, it does not look like I can use Array of Structs with GPU’s.

Can I use a Struct of Arrays where the arrays are mutable with GPU’s? or should I just stick to not using a struct and just hold information in arrays, by necessity mutable arrays

for example a

struct XYZ 

mutable struct ArrayOfStructs

mutable struct StructOfArrays


natoms = 1000 # arbitrary number of atoms in simulation
positions  = Array{Float64}(undef, natoms,3)
velocity   = Array{Float64}(undef, natoms,3)
types      = Array{Int64}(undef, natoms)

I hope this is a relevant question for this topic. I spend too much time worrying about the future of my code, but if I need to make changes, I would rather do it sooner than later.


Even so, your Particles should likely still be immutable and when you do the update you just update the whole particle at once. Also, instead of struct XYZ consider using an SVector from

Regarding memory layout, with something like it is pretty easy to abstract the operations from the how the memory is stored.


I have been debating the usefulness of the XYZ struct alot. It makes for very readable code, but I must manually type out the x, y and z i.e., I cannot

for i=1:3 
   f[i] = answer[i]

Julia makes for loops fast, even short ones where as the XYZ requires

f.x = answer[1]
f.y = answer[2]
f.z = answer[3]

I mean, answer could be XYZ as well, but it still takes three lines in serial to update f. Is it wrong of me to think that using XYZ due to the serial nature of the update, could be up to 3x slower? I imagine there is overhead with arrays, but still, Julia is pretty damn clever with for loops.


You’ll want to use StaticArrays.jl and write things on the vector level (similar to writing things in vector form in physics).

Allocating many normal Julia Vector of length 3 will be very inefficient.


So can I take from this that I should not use Array of Structs?


How do you find the StructArrays for speed?

I am a bit of a noob but it looks pretty handy.


I like that you can select either the row or the column. typical objects have always bothered me in that if you make an array of objects (structs in Julia), you can’t iterate
through all the object.x’s (not quickly, like taking a slice of all x’s is more of what I mean.)

the Struct array looks like it can do that… I can store information for each particle, so they can each have a velocity, position, integer type, string name. I can slice all this information for a single particle i.e., its position, velocity, type, name, or I can also if I want, slice all velocities as a separate array?

And, it works with GPU's too. It seems too good to be true… what is the catch? For instance, I notice when you slice, you end up with Arrays, not say, SArrays. However, if it can be used in GPU's, then the speed difference between an Array and SArray would not be an issue would it?


SArray is for the 3-dim physical vectors (rank 1 tensors) so what I mean is:

using StructArrays
using StaticArrays

struct Particle{T}
    position::SVector{3, T}
    velocity::SVector{3, T}


aos = [Particle(rand(SVector{3}), rand(SVector{3}), rand(1:5)) for i in 1:N_PARTICLES]
soa = StructArray(aos)

Also, for indexing the soa structure to be efficient you want the element type to be isbitstype as

> isbitstype(Particle{Float64})
> true


I hope there is no “catch”: as mentioned above, as long as your structs are isbits StructArrays should be quite fast.

In the example snippet above, if you just do StructArray(CuArray(aos)) you should get a StructArray where all columns are CuArrays.