Array of multiple differently sized vectors

lukas_r · November 15, 2022, 2:22pm

I have an array of multiple vectors with different lengths. For further use I need them in one [n, m] Array. Is there a function to fill up the shorter vectors with missing/NaN?

Palli · November 15, 2022, 3:08pm

Not that I know of (but I might not know, and very possibly it exists). But it’s not too hard:

julia> V1 = [1.0, 2.0, 3.0];
julia> V2 = [1, 2];

julia> A = Matrix{Float64}(undef, max(size(V1, 1), size(V2, 1)), 2)
3×2 Matrix{Float64}:
 0.0           0.0
 6.92079e-310  6.92079e-310
 6.92079e-310  6.92079e-310

julia> A[:, 1] = V1
julia> A[1:size(V2, 1), 2] = V2

julia> A
3×2 Matrix{Float64}:
 1.0  1.0
 2.0  2.0
 3.0  6.92079e-310

Notice you you need to fill in the NaNs i.e. there in the corner, or you can make a Matrix with missing instead of undef (see help for Matrix, it’s slightly more different), and then you’re done, except I think you would rather want with NaNs, unless your type isn’t a number, then not possible. [@StefanKarpinski The docs first show an example with nothing then for missing. I think the docs should show missing first as it’s preferred, and alphabetical order… missing is more recent in the language, and while nothing is still supported, and works, and has a bit different meaning, I think you rarely if ever want to use it, at least for this, so possibly it should just be struck from the docs?]

Also you want this in some sort of loop, to make more general, i.e. so to do this for an arbitrary number of vectors, then from a vector of vectors… You figure it out. I was just curious to find out myself how I would do this.

stevengj · November 15, 2022, 3:45pm

undef is not a value, it means that you are just grabbing arbitrary memory (“random” values), which is almost certainly not what you want for your missing data.

If you have an array V of vectors, and you want to copy to a 2d array A with missing data taken up by NaN, you could do something like:

A = fill(NaN, m, n)
copyto!.(eachcol(A), V) # to copy to the columns
copyto!.(eachrow(A), V) # to copy to the rows

(Of course, you could also just write a loop. Loops are fine in Julia.)

Palli · November 15, 2022, 4:49pm

Yes, for sure. As I said “you you need to fill in the NaNs”, you will know exactly where. This is completely safe if you do that, have correct code, that makes sure you fill in those (possible) gaps.

undef is maybe a little advanced for a new user (not surprising to e.g. C users, the default there). fill(NaN, m, n) also avoids the problem, but you will write some, maybe most (or possibly half depending), of your array locations twice, so that much slower. It’s good to know both, pros and cons, and not default to undef, it may be premature optimization, and in general risk mistakes.

[There’s one loophole, but I don’t see it happening in this case, since you’re doing a memory copy, and it shouldn’t fail. If you are filling memory with some calculated value, e.g. sqrt or for some reason you get an exception, you will no have filled over all your undefs. But there’s nothing sensible to do with a half-filled array, and you get an exception so you shouldn’t use it. It would be very stupid to catch the error and keep going as if nothing happened, and also a problem without undefs when I think about it, while more obvious going forward. I’ve seen people do mistaken exception-handling before…]

lukas_r · November 16, 2022, 12:25pm

This did the trick, thank you!

Topic		Replies	Views
Proper way to initiate an array with `missing` General Usage question	10	2309	March 26, 2022
Array of Vectors with Different Lengths New to Julia	2	1730	January 5, 2022
Meaning and alternatives to "undef" when initializing vectors New to Julia	11	2248	June 4, 2020
Vector with elements of specified type? New to Julia	5	547	March 18, 2020
Concatenate Vector of Vector with differents length New to Julia question , vector-of-vectors	8	328	January 22, 2024

Array of multiple differently sized vectors

Related topics