ElasticArrays/ArraysofArrays for Vector of Arrays

Dear all,

I would like to use ArraysOfArrays and ElasticArrays to handle indexing a Vector vec of type Vector{<:AbstractArray{T}} where {T} better, as I often need to call vec[iter][idx], and afaik with these packages I can internally index vec as a pure Vector{T}. I figured out how to use both packages together fairly easily with a Matrix:

using ArraysOfArrays, ElasticArrays, Random
Ncols = 1000
Nrows = 100
T = Int64

ela = ElasticArray{T}(undef, Ncols, Nrows)
vov = VectorOfSimilarArrays(ela) #ArrayOfSimilarArrays{Int64, 1, 1, 2, ElasticMatrix{Int64, Vector{Int64}}}

Now, vov.data is still a Matrix, and it contains type information that it stores an ElasticMatrix, so I can resize the last dimension arbitrarily often, which is really nice:

resize!(vov, 123)
resize!(vov, 12)
resize!(vov, Nrows)

I played around with ElasticArrays and Vector of Vectors, and figured that I can actually resize the number of inner vectors, AND push!() them, which makes it possible to change both dimensions without additional allocations:

ela2 = ElasticVector([zeros(T, Nrows) for _ in 1:Ncols])
push!(ela2[1], 1)
resize!(ela2, 123)
resize!(ela2, 12)
resize!(ela2, Ncols)

Unfortunately, there is no easy way to access ela2.data as a pure Vector/Matrix anymore, as it is stored as a Vector{Vector{Int}}.

Is there a way that I can convert ela2 to an ArraysOfArrays, such that

  • I can still push!() and resize!() as in the example before,
  • and also have the data somewhere stored as a pure Matrix/Vector{Int} instead of a Vector{Vector{Int}}? I tried this already but it seems all type information is lost when converting the ElasticArray to an ArrayofArrays:
ela2 = ElasticVector([zeros(T, Nrows) for _ in 1:Ncols])
ela2.data #Vector{Vector{Int64}}
vov2 = VectorOfSimilarArrays(ela2) #No more ELASTICARRAY in type information: 1000-element ArrayOfSimilarArrays{Int64, 1, 1, 2, Matrix{Int64}}

vov2.data #100Ă—1000 Matrix{Int64} - This is perfect
# However, we can no longer resize/push!() the array as the ElasticVector type has been converted to a standard Vector:
resize!(vov2, 123) #MethodError: no method matching resize!(::Matrix{Int64}, ::Int64, ::Int64)
push!(vov2[1], 1) # MethodError: no method matching resize!(::SubArray{Int64...

@oschulz - Sorry for tagging you, but I believe you will know if what I want is even possible.

Hi @mrVeng ,

happy to assist.

As for as

ela2 = ElasticVector([zeros(T, Nrows) for _ in 1:Ncols])

is concerned: ElasticVector is only part of ElasticArrays for completeness. Standard Julia Vectors are already resizable, of course, so it doesn’t add any important functionality. ElasticArray{T,N} only becomes interesting for N > 2, really.

Is there a way that I can convert ela2 to an ArraysOfArrays […] still push!() and resize!() […] stored as a pure Matrix

Unfortunately, no: As long as the underlying matrix (backed by a vector) is stored in memory in a contiguous fashion (as ElasticArray does), resizing it in any but the last dimension is not an efficient operation, since the matrix (resp. the vector backing it) would have to be completely rearranged in memory.

There’s an old issue by @ChrisRackauckas regarding resizing ElasticArray in other dimensions. @stevengj suggested to use padding and reallocating only when running out of padding space, this would make it efficient. Two problems though:

  • ElasticArray is currently an immutable struct, and it would have to become mutable for this. In the past, it didn’t matter much whether it was mutable or immutable, but nowadays Julia is able to stack-allocate immutable objects that point to arrays. So nowadays having it immutable is a plus. Of course a mutable elastic array type could be added to ElasticArrays.jl, sharing much of the code with ElasticArray.

  • You want to push to individual elements of the vector or vectors. As soon as you do that, they would become dissimilar in length, so a VectorOfSimilarVectors would literally not describe your use case anymore.

I can, however, offer a solution, though it’s not implemented yet:

ArraysOfArray.jl provides a type VectorOfArrays, which can handle element arrays that differ in size, backed by a single flat vector. Currently, you can’t push to the element vectors/arrays, since that would require rearranging the backing vector. I have firm plans to support vectors of arrays that, while backed by a single vector, do not have to be stored in-order. This one will allow resizing element arrays, which will result in appending the new element and just changing indexing info. This will be efficient, as long as pushes/appends aren’t in random order in respect to the element index. It will also come with a “compactify” operation, to re-compact and sort things after all appending/pushing, etc. is done. I would like to add this soon, just have to find a bit of time.

2 Likes

Thanks for your detailed answer!

ArraysOfArray.jl provides a type VectorOfArrays , which can handle element arrays that differ in size, backed by a single flat vector. Currently, you can’t push to the element vectors/arrays, since that would require rearranging the backing vector. I have firm plans to support vectors of arrays that, while backed by a single vector, do not have to be stored in-order. This one will allow resizing element arrays, which will result in appending the new element and just changing indexing info. This will be efficient, as long as pushes/appends aren’t in random order in respect to the element index. It will also come with a “compactify” operation, to re-compact and sort things after all appending/pushing, etc. is done. I would like to add this soon, just have to find a bit of time.

This sounds really good! I will go for an ElasticMatrix for now as I can still resize the last dimension, and check for updates on the ArraysOfArrays package.

Just curious, @mrVeng , can you say a little bit about your use case? User stories will be helpful when designing the ArraysOfArrays “upgrade”.

Sorry for the late reply, I have been working with ElasticArrays a bit to get a better understanding.

I am basically approximating a latent parameter trajectory via many particles. It is convenient to just create a Vector of approximations of such trajectories here (a Vector of Arrays/Vectors). One can also push forward the inner arrays easily with this setup if we want to extend them. The problem here is that often we have to resample the inner vectors, and this is done in the least cache-friendly way for Julia and slows down the code. I figured if I can use a simple Vector/Matrix under the hood and just create views on the inner vectors when needed I would be better off than using Vector of Vectors.

I have been working with an ElasticMatrix now and this is actually quite convenient, as I can resize the matrix once new data arrives, and I can sort the elements more cache friendly. I just have to create a view on a matrix row when I want to check for the “inner” vectors.

I believe the VectorOfVector implementation is quite similar, right? You have a vector instead of a matrix for storage, but create a view as soon as I want to view one of the inner vectors. Out of curiosity, what would happen if you have a Vector of multidimensional arrays? Would you be able to “push”/append the inner arrays?

Hey, sorry for the late reply!

I am basically approximating a latent parameter trajectory via many particles.

Ah, yes, I have use cases like that myself (not trajectories, but n-pixel hits per event in a pixelized detector).

The problem here is that often we have to resample the inner vectors, and this is done in the least cache-friendly way for Julia and slows down the code.

Do you have to resample them in random order, or one-by-one in succession?

I believe the VectorOfVector implementation is quite similar, right? You have a vector instead of a matrix for storage, but create a view as soon as I want to view one of the inner vectors.

Yep!

Out of curiosity, what would happen if you have a Vector of multidimensional arrays? Would you be able to “push”/append the inner arrays?

Currently, no. But with the improvements I want to implement there will be a vector of array that let’s you do that - as a result, a new one will be allocated at the end of the storage array, where it can grow. It would be efficient as long as you don’t push to the element vectors/arrays in random order. And there would be a “compactify” operation that you can call at the end to resort the whole thing and eliminate unused space in the middle.

1 Like