Array of multiple differently sized vectors

I have an array of multiple vectors with different lengths. For further use I need them in one [n, m] Array. Is there a function to fill up the shorter vectors with missing/NaN?

Not that I know of (but I might not know, and very possibly it exists). But it’s not too hard:

julia> V1 = [1.0, 2.0, 3.0];
julia> V2 = [1, 2];

julia> A = Matrix{Float64}(undef, max(size(V1, 1), size(V2, 1)), 2)
3Ă—2 Matrix{Float64}:
 0.0           0.0
 6.92079e-310  6.92079e-310
 6.92079e-310  6.92079e-310

julia> A[:, 1] = V1
julia> A[1:size(V2, 1), 2] = V2

julia> A
3Ă—2 Matrix{Float64}:
 1.0  1.0
 2.0  2.0
 3.0  6.92079e-310

Notice you you need to fill in the NaNs i.e. there in the corner, or you can make a Matrix with missing instead of undef (see help for Matrix, it’s slightly more different), and then you’re done, except I think you would rather want with NaNs, unless your type isn’t a number, then not possible. [@StefanKarpinski The docs first show an example with nothing then for missing. I think the docs should show missing first as it’s preferred, and alphabetical order… missing is more recent in the language, and while nothing is still supported, and works, and has a bit different meaning, I think you rarely if ever want to use it, at least for this, so possibly it should just be struck from the docs?]

Also you want this in some sort of loop, to make more general, i.e. so to do this for an arbitrary number of vectors, then from a vector of vectors… You figure it out. I was just curious to find out myself how I would do this.

undef is not a value, it means that you are just grabbing arbitrary memory (“random” values), which is almost certainly not what you want for your missing data.

If you have an array V of vectors, and you want to copy to a 2d array A with missing data taken up by NaN, you could do something like:

A = fill(NaN, m, n)
copyto!.(eachcol(A), V) # to copy to the columns
copyto!.(eachrow(A), V) # to copy to the rows

(Of course, you could also just write a loop. Loops are fine in Julia.)

6 Likes

Yes, for sure. As I said “you you need to fill in the NaNs”, you will know exactly where. This is completely safe if you do that, have correct code, that makes sure you fill in those (possible) gaps.

undef is maybe a little advanced for a new user (not surprising to e.g. C users, the default there). fill(NaN, m, n) also avoids the problem, but you will write some, maybe most (or possibly half depending), of your array locations twice, so that much slower. It’s good to know both, pros and cons, and not default to undef, it may be premature optimization, and in general risk mistakes.

[There’s one loophole, but I don’t see it happening in this case, since you’re doing a memory copy, and it shouldn’t fail. If you are filling memory with some calculated value, e.g. sqrt or for some reason you get an exception, you will no have filled over all your undefs. But there’s nothing sensible to do with a half-filled array, and you get an exception so you shouldn’t use it. It would be very stupid to catch the error and keep going as if nothing happened, and also a problem without undefs when I think about it, while more obvious going forward. I’ve seen people do mistaken exception-handling before…]

This did the trick, thank you!