Should I use a matrix or vector of vectors?

I have a bunch of vectors: v1, v2, and so on. Lets call these inner vectors.

I want to put them in an array, but need to be able to access each one arbitrarily.

I need to read the inner vectors as a whole, and do not care to read individual elements from them.

What is the best data structure?

  1. matrix where each column is an inner vector:
    hcat(v1, v2)

  2. vector of inner vectors:
    [v1, v2]

  3. matrix where each row is an inner vector (I believe this is the worst):
    vcat(v1', v2')

1 Like

if you will never need any kind of row-wise operation, then probably vector of vectors is best, as this will make your broadcasting life easier rather than having to use eachcol or mapslices constantly.

3 Likes

you either want 1 or 2. knowing which is harder without knowing what operations you need, but I would probably go with 2 by default.

Are the vectors intrinsically of the same length? Of so, I believe a matrix with vectors lying along the columns is the right choice.

Yes, the (inner) vectors are of the same length. I do use a lot of broadcasting so I find the vector-of-vectors structure easier. I tried matrices but then found myself having to work with Slices and got scared.

If they are all the same length I suggest matrix or vector of SVectors. No reason to be scared of slices/views. Just use eachcol.

1 Like

A noteworthy difference between matrix and vector of vectors is memory locality: in a matrix, all of your data will be stored contiguously in memory, which may speed up access

1 Like

On the flip side, if you’re adding, removing or swapping columns, then you probably need a vector of vectors. It depends on your use case, and locality of your access pattern. E.g. if you need coulmns i and N-i simultaneously, it might make sense to use a vector of vectors.

As @Oscar_Smith said,

as this is the natural datastructure for your problem description. Why create extra entities unnecessarily?
If you have other requirements, like keeping all “vectors” close in memory, you can later switch to a matrix or whatever.

I realized that in trying to present a simplified version of the problem I might have lied a little. I am working with two types, one stores data in vectors-- the v1, v2 I mentioned originally. The other stores them in 2-column matrices (the data comes in pairs). These I can not change. So I should probably go with the vector-of-vectors. (I could use matrices for the original problem I posed if it really makes sense to do that.)

My inner vectors/matrices will not need to be re-written once created and will almost always be read in sequence (v1 then v2, or v2 then v1). So it makes sense to have them in roughly the same place in memory. I didn’t realize vector-of-vectors would not store things around the same place in memory. Maybe it depends on how they were created. Two questions:

  1. How can I check whether the inner vectors in my vector-of-vectors are contiguous in memory?
  2. Is there a way to ensure that they are? Would a deepcopy of my vector-of-vectors do the trick (even if it is a hack)?
    I’m not concerned about the cost of creating the vector-of-vectors. I’m more concerned about the speed/ease of reading from it once created.

I am more of a researcher than a computer scientist. It might not turn out to make much difference in the end-- we’re talking about reading vectors or 2-column-matrices of length no more than 100,000 per second. But I’ve also not worked on something like this before. I’m trying to think more about design early on.

1 Like

I don’t know of any easier way than to use a matrix, precisely because in Julia they are stored in column-major order. So pretty much by definition, a matrix is a “vector of vectors” stored contiguously.

But once you have your matrix M, be sure to use @view M[:, i] instead of M[:, i] to access the i-th column, otherwise the data will be copied.

I found some discussion here about checking contiguity of a vector’s elements:

I might very well implement the inner-vectors as structs, so I think things get complicated. So we can rest the discussion here. Thanks for your help.

As a summary (so I can mark this post as a solution), here are the recommended options:

  1. vector-of-vectors:
  • work well with broadcasting
  • inner vectors might not be contiguous in memory
  • might be preferable than the next option (see 2 below) if the inner vectors need to be read in arbitrary order
  1. matrices with inner-vectors as columns:
  • would work if the inner-vectors have the same length
  • preferable since matrix columns are contiguous in memory (if that’s important)
  • columns should be retrieved as views so as not to copy data