Should I use a matrix or vector of vectors?

sadish-d · May 26, 2023, 10:07pm

I have a bunch of vectors: v1, v2, and so on. Lets call these inner vectors.

I want to put them in an array, but need to be able to access each one arbitrarily.

I need to read the inner vectors as a whole, and do not care to read individual elements from them.

What is the best data structure?

matrix where each column is an inner vector:
hcat(v1, v2)
vector of inner vectors:
[v1, v2]
matrix where each row is an inner vector (I believe this is the worst):
vcat(v1', v2')

adienes · May 26, 2023, 10:35pm

if you will never need any kind of row-wise operation, then probably vector of vectors is best, as this will make your broadcasting life easier rather than having to use eachcol or mapslices constantly.

Oscar_Smith · May 26, 2023, 11:07pm

you either want 1 or 2. knowing which is harder without knowing what operations you need, but I would probably go with 2 by default.

DNF · May 26, 2023, 11:08pm

Are the vectors intrinsically of the same length? Of so, I believe a matrix with vectors lying along the columns is the right choice.

sadish-d · May 26, 2023, 11:48pm

Yes, the (inner) vectors are of the same length. I do use a lot of broadcasting so I find the vector-of-vectors structure easier. I tried matrices but then found myself having to work with Slices and got scared.

DNF · May 27, 2023, 12:01am

If they are all the same length I suggest matrix or vector of SVectors. No reason to be scared of slices/views. Just use eachcol.

gdalle · May 27, 2023, 6:42am

A noteworthy difference between matrix and vector of vectors is memory locality: in a matrix, all of your data will be stored contiguously in memory, which may speed up access

jishnub · May 27, 2023, 7:17am

On the flip side, if you’re adding, removing or swapping columns, then you probably need a vector of vectors. It depends on your use case, and locality of your access pattern. E.g. if you need coulmns i and N-i simultaneously, it might make sense to use a vector of vectors.

aplavin · May 27, 2023, 7:47am

As @Oscar_Smith said,

as this is the natural datastructure for your problem description. Why create extra entities unnecessarily?
If you have other requirements, like keeping all “vectors” close in memory, you can later switch to a matrix or whatever.

sadish-d · May 27, 2023, 12:09pm

I realized that in trying to present a simplified version of the problem I might have lied a little. I am working with two types, one stores data in vectors-- the v1, v2 I mentioned originally. The other stores them in 2-column matrices (the data comes in pairs). These I can not change. So I should probably go with the vector-of-vectors. (I could use matrices for the original problem I posed if it really makes sense to do that.)

My inner vectors/matrices will not need to be re-written once created and will almost always be read in sequence (v1 then v2, or v2 then v1). So it makes sense to have them in roughly the same place in memory. I didn’t realize vector-of-vectors would not store things around the same place in memory. Maybe it depends on how they were created. Two questions:

How can I check whether the inner vectors in my vector-of-vectors are contiguous in memory?
Is there a way to ensure that they are? Would a deepcopy of my vector-of-vectors do the trick (even if it is a hack)?
I’m not concerned about the cost of creating the vector-of-vectors. I’m more concerned about the speed/ease of reading from it once created.

I am more of a researcher than a computer scientist. It might not turn out to make much difference in the end-- we’re talking about reading vectors or 2-column-matrices of length no more than 100,000 per second. But I’ve also not worked on something like this before. I’m trying to think more about design early on.

gdalle · May 27, 2023, 12:22pm

I don’t know of any easier way than to use a matrix, precisely because in Julia they are stored in column-major order. So pretty much by definition, a matrix is a “vector of vectors” stored contiguously.

But once you have your matrix M, be sure to use @view M[:, i] instead of M[:, i] to access the i-th column, otherwise the data will be copied.

sadish-d · May 27, 2023, 1:13pm

I found some discussion here about checking contiguity of a vector’s elements:

I might very well implement the inner-vectors as structs, so I think things get complicated. So we can rest the discussion here. Thanks for your help.

As a summary (so I can mark this post as a solution), here are the recommended options:

vector-of-vectors:

work well with broadcasting
inner vectors might not be contiguous in memory
might be preferable than the next option (see 2 below) if the inner vectors need to be read in arbitrary order

matrices with inner-vectors as columns:

would work if the inner-vectors have the same length
preferable since matrix columns are contiguous in memory (if that’s important)
columns should be retrieved as views so as not to copy data

Topic		Replies	Views
Matrices vs vectors of vectors performance Numerics question , performance	3	456	July 13, 2022
Quick Longitudinal Access to arrays General Usage	34	2262	March 3, 2018
Vcat row vectors -> matrix? New to Julia question , vector	10	2049	July 27, 2022
Best way to handle a list of arrays (Asking for recommendations) General Usage	3	455	December 21, 2018
Memory best practices in Julia with Arrays vs Vectors New to Julia memory-allocation , arrays , staticarrays , simulations	19	864	August 18, 2024

Should I use a matrix or vector of vectors?

Related topics