Non-rectangular arrays in Julia

Hi,

I have to manipulate non-rectangular multidimensional arrays.
Let a be a 2D array of floats with two columns and each column has a different size:
length(a[:,1])=n1 and length(a[:,2])=n2 with n1!=n2

I wonder if it would be a good idea to create my own array type MultiArray <: AbstractArray
but it would break the AbstractArray interface because the size method would not return a simple tuple…

Maybe a new EvenMoreAbstractArray with a shape function that returns a generalized form of size…

Any hint ?

Interesting, do you definitely need to represent it as a 2D array?

If yes, perhaps padding your data would be easier (you could pad it with garbage, and just record the lengths of the real data elsewhere).

If not, would an array of (variable-length) arrays work?

Seems like https://github.com/mbauman/RaggedArrays.jl is what you are looking for. Might need to be updated for julia 0.6.

1 Like

Hi felix, thank you for this instantaneous reply !

The solution of an array of array is OK in my case (padding is not because the length can be very different).
My only concerns is that the indexing will be inhomogeneous. If a is a 4D array built as a 2D array of (differently sized) 2D arrays, the indexing syntax will be:
a[i,j][k,l]

1 Like

Thank you for the RaggedArrays link !

I have a package for something similar:
https://github.com/tpapp/RaggedData.jl

Also supports ingestion of data with an ex ante unknown number of elements per column.

2 Likes

Thanks !
actually for my application, I deal with ragged (I learned a new word) arrays of rectangular arrays.
I need the inner rectangular arrays to be really fast. I guess that the first solution (array of arrays) will be more efficient…

It depends. For my application, mapping into a flat vector was the most efficient (because it uses the least memory and I was memory-constrained, and I have lots of small vectors, with eg 5–100 elements). I think the same approach can be extended for arrays. But make sure your profile and benchmark. Also, I am experiencing a lot of speedups on v0.7 compared to v0.6.

Thank you for the tips. I will experiment the different options.

Why not use an Array{<:Array{<:Any,1},1} for that?

Yes, I guess that it is what felix proposed (array of array). I think I will go that way.