I just finished an enormous scientific simulation (billions of regressions, trillions of allocations).
The final output are a few arrays of matrices: 100000-element Vector{Matrix{Float64}}. I am looking for a solution to store them locally to recall them later in Julia. (both for future use and because I need to kill julia as it’s boggling up 7gb of RAM and 95% GPU after all the heavy lifting).

Essentially, how do I store structures of the type:

A= [ rand(3,3) for i in 1:10^7 ]

Cursory browsing of binary formats packages (HDF5, JLD2 , etc) yields no results.
I am really married to the Array{Array{Float64, 2}, 1} struct as 1000s of lines of code are written around it. I suppose I’ll have to reshape them into vectors for storage to hcat() them after recall? If so, what’s an efficient algorithm to do so please?
Thanks!

Edit: although it does seem like HDF5 was built for exactly the sort of data that you are trying to write:

using HDF5
A = [rand(3, 3) for ind in 1:10]
M = stack(A) # <-- As far as I can tell ... this isn't very expensive.
h5write("test.h5", "bigggg/M", M)
M_copy = h5read("test.h5", "bigggg/M")

Thanks for the reply :)) did you try running this code? This is DataFrames.stack(), right? if so, it throws:

ERROR: MethodError: no method matching stack(::Vector{Matrix{Float64}})

If stack() were to work it’d be exactly what I am looking for: some function to convert my structure into something storable to be unconverted upon use.

BTW, yes I did tentatively tinker with HDF5 but I was discouraged by the documentation which suggests it accepts a fairly narrow range of types.

If I were you, I would preallocate a three-dimensional array from the beginning (with the last axis for different simulations and the first two for the matrices) and let the code for simulations directly write results to that array in-place. For example, you could use selectdim to get a view into the three-dimensional array for a specific matrix associated with a given value in the third dimension.

If it really has to be Array{Array{Float64, 2}, 1}, then you could just create a large vector and then use a for loop to copy values into the large vector. There is no need to use something else. Just a plain for loop is good enough. The resulting vector can be written to file in various formats. I would consider Arrow.jl which is fast.

Also, why is the range of types for HDF5 not sufficient if everything is about numbers?

Yes - I had that as a script on my system. It ran fine on Julia 1.9 … stack wasn’t Dataframes.stack though, it was probably Base.stack or something. In my REPL:

help?> stack
search: stack stacktrace StackTraces StackOverflowError instances abstract AbstractSet AbstractChar AbstractDict
stack(iter; [dims])
Combine a collection of arrays (or other iterable objects) of equal size into one larger array, by arranging them
along one or more new dimensions.

For large arrays of 3x3 matrix, or any small fixed-size matrix, I would strongly consider using arrays of StaticArrays. They will be much faster to work with and take less storage and be easier to store on disk (because an array of SMatrix is stored contiguously in memory. (Your code won’t need to change much because an SMatrix can still be used with standard matrix operations.) e.g. you can easily store it as a single HDF5 dataset, possibly with a reinterpret.

Ok so for now this works, although I am surprised there isn’t a way to store idiosyncratic Julia structs:

using CSV, Tables
to_store= reduce(vcat, A) # as big matrix
CSV.write("FileName.csv", Tables.table(to_store), writeheader=false) #easily storable in universal formats
temp= CSV.File("FileName.csv", df) |> Tables.matrix() #call and transform to matrix
#shape into arrays of arrays.
to_use= [temp[i:(i+size(temp,2)-1), :] for i in range(1, size(temp,1), step= size(temp,2)]

I was under the impression @frylock suggested HDF5 can store jagged arrays, hence my ongoing skepticism.

Why? My understanding is that lists of arrays are faster and more readable. For one, the indexing will be column-based more often. Then it’s also faster to program: I.E: A[i] to access a desired matrix instead of less readable and more complex slicings (like the ones in my workaround above).
I am also interested in conducting matrix-level operations on each element of A, which are efficiently parallelized across the Array{Array{Float64, 2}, 1} struct natively I.E:

> @btime mean($A); # outputs single 3x3 matrix of means across all A
45.558 ms (10000001 allocations: 1.19 GiB)

> @btime mean.($A) #outputs vector of length(A) of matrices A[i] means
13.817 ms (2 allocations: 76.29 MiB)

I am curious if there exists comparably readable and efficient methods for multi-dim arrays.

Your CSV implementation relies on information that is not in type — it relies on the fact that all of the elements are matrices of the same size. A generic routine couldn’t do this. (What you have is not like an array of struct!)

Realize that [ rand(3,3) for i in 1:10^7 ] is stored in memory as an array of pointers to 3x3 matrices. Moreover, the fact that all of the elements are the same size is not part of the type, so a code to save/load a Vector{<:Matrix} would have to use a fairly complicated format — it can’t simply use a flat table!

In contrast, if you have a Vector of SMatrix, e.g. A = [@SMatrix(rand(3,3)) for i = 1:10^6], then the size of the matrices is stored in the type. In memory, Julia stores this as a contiguous array of 9 * 10^6 floating-pointing numbers. And formats like HDF5 (and hence JLD) can save it efficiently with one big contiguous write. (For that matter, you can save it efficiently to disk simply by using write.)

And, as I noted above, SMatrix is way faster for operations on millions of 3x3 matrices.

> @btime mean($A); # outputs single 3x3 matrix of means across all A
45.558 ms (10000001 allocations: 1.19 GiB)

The large number of allocations incurred here is a sign of something is not right. As I don’t see a reproducible example, I couldn’t tell for sure what’s wrong. But, it’s totally doable to have a single multi-dimensional array for achieving what you described for a vector of many small arrays.

Truly amazing! Did some preliminary benchmarking and results on CPU StaticArrays are comparable to highly parallelized GPU ones on Metal.jl Arrays. Might warrant a partial rewriting!

Do you know if this might cause type-mismatch or method issues? Most of my code is type-stable and I am trying to assess how time-costly a rewrite will be.

That is, will it still work with let’s say function foo(::Tuple{Vector{Vector{Float64}}}}) which takes tuples constructed from A as input? Thank you again for what was a fruitful rabbit hole!

It’s more that you may not get the full benefit if you don’t audit your code. e.g. you want to make sure you allocate all of your 3-component vectors as SVector too (e.g. watch out for calls to zeros etcetera). It’s really only the constructors that you have to watch out for, since once you have static arrays then operations only involving SMatrix and SVector and scalars will produce more static arrays.

Not if you declared your function like that, because then you’ve told Julia that it only works with vectors of Vector, not any other type of array (or any other numeric type).

It’s a common mistake in Julia to over-specify function argument types. It generally does not help performance (since Julia compiles a specialized version of any function for the types that are actually passed, even if no types were declared at all), and it makes your code less generic. See the manual section on Argument-Type Declarations).

If you really want to declare a type (e.g. for clarity, or to dispatch different argument types to different methods), you should try to make the types generic, and write the code to be generic as well — that means inferring the types of variable from the types of the arguments, not hard-coding them. For example, a more generic declaration of your foo example might be:

function foo(::Tuple{<:AbstractVector{<:AbstractVector{T}}}) where {T<:Number}
...
end

where I’ve separated out the number type T as a type parameter in case you want to use it to decide on types inside foo. For example, instead of initializing a variable as x = 0.0, which hard-codes the precision to Float64, you might do x = zero(T) or x = float(zero(T)) so that the precision is determined by the argument types.

Writing fully type-generic code can take some practice and discipline, but it’s not something you need to do all at once. You can gradually migrate your code to make it more generic. Start by making sure it works with static arrays (and make sure you never accidentally allocate a non-static array). Then maybe try to make it work for both double and single precision. Etcetera. (This is an ongoing process even for Julia Base. e.g. making linear algebra functions work for non-commutative arithmetic or dimensionful numbers doesn’t happen overnight.)