Proper way to concatenate higher-dimensional arrays?

Hi all, I’m extending data arrays from 2 dimensions to 4 dimensions, and I want to make sure that the concatenation is doing what I think it’s doing. Previously, I used command loops like:

DatArr = Array{Float64,2}(undef, 1, 3)
DatArr = [[] [] []]
for i in 1:10
     [x(i) y(i) z(i)] = {...big calculation...}
     DatArr = vcat(DatArr, [x(i) y(i) z(i)])
end

But now, I have to do this same calculation not just once, but multiple times over a 2-dimensional matrix of input parameters. This is what I wrote at first, but I’m pretty sure it’s NOT right:

DatArr = Array{Float64,4}(undef, Nj, Nk, 1, 3)
for j in 1:Nj
     for k in 1:Nk
          DatArr[j,k] = [[] [] []]
     end
end

(...much later...)

for i in 1:10
     for j in 1:Nj
          for k in 1:Nk
               [x(i,j,k) y(i,j,k) z(i,j,k)] = {...big calculation...}
               DatArr[j,k] = vcat(DatArr[j,k], [x(i,j,k) y(i,j,k) z(i,j,k)])
          end
     end
end

So… am I right in suspecting that this may be wrong, and NOT doing what I obviously am trying to do… the same concatenations as before, just doing it independently for each value of j & k? Can anyone give me suggestions on the CORRECT way to do these concatenations?

(FYI, due to heavy computational reasons, I cannot switch the order of the nested loops… the j & k loops MUST stay nested inside the for i in 1:10 loop.
Also, note that each new data expression like [x(i) y(i) z(i)] is not really just a 1x3 array of floats, but is actually a shorthand way of me indicating an Nx3 array of floats, where N ~ few x 100.)

I’d be very grateful for any recommendations… thanks!

I’m a bit confused about the actual size/shape of your data - when I paste the first few lines from your second example into the REPL, it return an error, so it’s hard to guess what it’s supposed to look like. If I understand correctly, your iteration space is fixed, so you could calculate the size of your final array (and preallocate such an array) from the outset. From a performance perspective, that’s way better than inner-loop vcat, which allocates a new, slightly-larger array each time it’s called.

Could you put together an example with realistic Nj, Nk, etc., (and dummy calculations, e.g. rand(1000, 3)) that the reader can run?

1 Like

From what you are writing (e.g. DataArr[j,k] = ...) it seems like you want DataArr to be a Matrix{Matrix{Float64}}?

The first loop in your 2nd code example is redundant. Once you allocated DataArr = Array{Float64,4}(undef, Nj, Nk, 1, 3), there is no need to fill the array with emptys. It also errors because you are accessing a 4D array with 2 indices; and because [[] [] []] is of eltype Any. If you want to fill the entire array with a known value, use DataArr = fill(NaN, dims...).

In the second loop, you are indexing x, y, z with i,j,k (but using round brackets). Did you just mean to write [x, y, z] = {...}? If so, and if DataArr is indeed a Matrix{Matrix} then the assignment DataArr[j,k] = vcat(...) should work, giving you that DataArr[j,k] is a Matrix where row i is [x y z] for the i-th iteration. But then, as previously mentioned, preallocating each element of DataArr is more efficient, if dimensions are known.

1 Like

Hi guys, thanks for the comments. Unfortunately, the array/matrix dimensions are not fully known in advance, since the [x(i) y(i) z(i)] arrays have lengths that are unknown until the calculation in the middle of the loops are done, and [x(i,j,k) y(i,j,k) z(i,j,k)] may differ in length for each value of j,k.

Also, I used terms like x(i) to indicate that x is a function (dependent in a complicated way) on i, not just for indexing purposes.

I’m going to post a more detailed example soon, showing more of the dependencies, and what is known and unknown in the loops.

So, here is a more detailed example of the first loop:

DatArr = Array{Float64,2}(undef, 1, 3)
DatArr = [[] [] []]
for i in 1:5
    Nrand = rand(1:10)
    for n in 1:Nrand
        xyzVect = [i*rand() (i^2)*rand() (i^3)*rand()]
        global DatArr = vcat(DatArr, xyzVect)
    end
end
println("\n\nDatArr = ",DatArr)
println("\nsize(DatArr) =",size(DatArr),"\n")

The second line overwrites the first with a Matrix{Any}, which will be slow.

julia> DatArr = Array{Float64,2}(undef, 1, 3)
1×3 Matrix{Float64}:
 0.0  0.0  0.0

julia> DatArr = [[] [] []]
0×3 Matrix{Any}

You may want to look at ElasticArrays.jl, which allows you to dynamically expand the last dimension of an array.

julia> using ElasticArrays

julia> DatArr = ElasticArray{Float64}(undef, 3, 0)
3×0 ElasticMatrix{Float64, 1, Vector{Float64}}

julia> for i in 1:5
           Nrand = rand(1:10)
           for n in 1:Nrand
               xyzVect = rand(3)
               append!(DatArr, xyzVect)
           end
       end

julia> DatArr
3×8 ElasticMatrix{Float64, 1, Vector{Float64}}:
 0.14786   0.954642  0.964941  0.824136  0.218328  0.041857   0.305235  0.262125
 0.429044  0.934869  0.546261  0.635685  0.214627  0.0316853  0.239581  0.792998
 0.555508  0.737745  0.765764  0.44316   0.21276   0.837226   0.261753  0.707933
1 Like

And here is a more detailed example of the second set of loops (basically doing the earlier calculation over a matrix of [j,k] values):

Nj = 4
Nk = 3
DatArr = Array{Float64,4}(undef, Nj, Nk, 1, 3)
for j in 1:Nj
    for k in 1:Nk
        DatArr[j,k] = [[] [] []]
    end
end
#
for i in 1:5
    for j in 1:Nj
        for k in 1:Nk
            Nrand = rand(1:10)
            for n in 1:Nrand
                xyzVect = [i*rand() j*rand() k*rand()]
                global DatArr = vcat(DatArr, xyzVect)
            end
        end
    end
end
println("\n\nDatArr = ",DatArr)
println("\nsize(DatArr) =",size(DatArr),"\n")

The first (initialization) loop in this post creates an error here – “MethodError: Cannot convert an object of type Array{Any,2} to an object of type Float64” – I’ve gotten this error before, but I don’t really understand it or know why.

The second (calculating) loop in this post also creates an error, due to the first loop failing: " BoundsError: attempt to access 4×3×1×3 Array{Float64,4} at index [1, 1]". But, I think that even if I got the first (initialization) loop working, then this second loop would not be doing what I intend. (Hopefully my intentions are clear from the code above.)

Any suggestions/fixes would be most welcome, thanks!

Interesting, stillyslalom, I’ll try using ElasticArrays.
Any reason that the dimensions have to be (undef, 3, 0), and not (undef, 0, 3)?

Also, I’ve posted the more complicated set of loops in my follow-up message.
Thanks!

The first error is the one I pointed out earlier: [[] [] []] is of eltype Any. So you cannot assign it to DataArr which is eltype Float64.

The entire loop is also redundant because you already allocated DataArr in the previous line.

There are two separate mistakes in the first initialization loop. Because you’re initializing DatArr to have a finite length (i.e. the product of the size of each axis), it has to contain something upon initialization, and you’ve requested undef values for that ‘something’. You’re then trying to clear those values with DatArr[j,k] = [[] [] []], but [[] [] []] is a 0x3 Matrix{Any}, which you can’t coerce into a 1x3 Matrix{Float64}. Beyond that, you need to provide all the indices of the matrix you’re indexing into: DatArr[j,k,:,:] instead of DatArr[j,k].

Better to just set the size of the axis you’re appending to as zero:DatArr = Array{Float64,4}(undef, Nj, Nk, 0, 3). That way, it’s completely empty and doesn’t contain any garbage from the outset.

Any reason that the dimensions have to be (undef, 3, 0) , and not (undef, 0, 3) ?

Yes: underneath every dense matrix/multidimensional array lies a 1D array that matches the linear memory on your computer–Julia just allows you to interface with that in an easy way instead of having to convert from N-D indices to 1D indices yourself. You can grow and shrink 1D arrays in base Julia (see push!, pop!, etc.), and if you do so, the memory manager allocates a larger linear space for your array to grow into. Base Julia doesn’t allow you to do that for multidimensional arrays, which is why ElasticArrays exists, but ElasticArrays still has to conform to the underlying 1D array interface, which constrains you to only growing the last dimension of the array.

This may not be what you are looking for, but do you really need all the results in a single matrix? If not, it may be more convenient (and a lot clearer) to do:

Nj = 4
Nk = 3
DatArr = Matrix{Matrix{Float64}}(undef, Nj, Nk)
for i in 1:5
    for j in 1:Nj
        for k in 1:Nk
            Nrand = rand(1:10)
            xyz = Matrix{Float64}(undef, Nrand, 3)
            for n in 1:Nrand
                xyz[n,:] = [10*i+n  10*j+n  10*k+n]
            end
            DatArr[j, k] = xyz
        end
    end
end
println("\n\nDatArr = ",DatArr)
println("\nsize(DatArr) =",size(DatArr),"\n")
1 Like

Hello hendri54, thanks for the sample code; actually I do need the results for each value of i to link together into a single array, because they are different pieces of a whole data set broken into i pieces to save memory. (Also, I didn’t show it here but I have several different DatArr’s holding different types of output data, and each one must be pieced together into a single result from the different sections labelled by i.)

Some general things I’m still confused about: is the type matrix different from type array with the same dimensionality? They seem interchangeable.

Also, I can’t figure what type Any is all about. I don’t know why Any keeps showing up, and it seems like that type should be able to take any kind of data! (Including Floats).

Based on the suggestions so far from you & stillyslalom, I’m thinking that elastic arrays are the best thing here for me to use to hold each array of data with the 2 dimensions indexed by (i,n); I can initialize and append them as shown above. Also, it seems that I could hold these arrays in 2-D tuples, with each element (j,k) of a tuple being one of those elastic arrays. (Still working on the details and sample code.)

Regarding this, is there an easy way to initialize a 2D tuple of dimensions (Nj, Nk), where each element in the tuple is an object like ElasticArray{Float64}(undef, 3, 0) ?
Thanks…!

Ok, so I haven’t figured out how to initialize the tuple of arrays yet, so stillyslalom, I tried taking your example using ElasticArrays and put it inside a loop over indices (j,k):

using ElasticArrays
Nj = 4
Nk = 3
DatArr = ElasticArray{Float64}(undef, Nj, Nk, 3, 0)
for i in 1:5
    for j in 1:Nj
        for k in 1:Nk
            Nrand = rand(1:10)
            xyzVect = ElasticArray{Float64}(undef, 3, 0)
            for n in 1:Nrand
               append!(xyzVect,rand(3))
            end
            append!(DatArr[j,k,:,:], xyzVect)
        end
    end    
end
println("\n\nDatArr = ",DatArr)
println("\nsize(DatArr) = ",size(DatArr),"\n")

The result was:

DatArr =
size(DatArr) = (4, 3, 3, 0)

so nothing got appended, which isn’t what I expected.

So we’re basically back to my first question, how to append this new data correctly…

Looking at your example, each (j, k) pair is going to have an xyzVect of a different length, because the source of random length is inside the j, k loop. Is that intentional? If that’s accurate, you’re constructing a ‘ragged’ array, and a dense 4D array is probably the wrong data structure for your use case.

If that’s not accurate: since the last dimension of an ElasticArray is the only dimension that can grow, you need to construct chunks of size j × k × 3 × nRand, then append those to the ElasticArray. Take a look at this sketch - can you write a loop to generate such chunks? If so, those can be easily appended to an ElasticArray.

1 Like

Yes, stillyslalom, it’s intentional (unavoidable) that the 2D (3xN element) array for each particular value of (j,k) could (and probably will) be a different length. That’s why I’m thinking that a tuple with (Nj)x(Nk) elements, and each element being a 3xN appendable array, might be the way to go. Do you have any ideas on the proper Julia syntax for some way to do that?

Basically, I have a huge data set that must be cut into i slices to fit into computer memory. For each “ith” piece, I then have to run (Nj)x(Nk) analyses, covering (Nj)x(Nk) parameter combinations, and for each combination I get an Nx3 (or 3xN) array. Then, I have to glue each of these Nx3 arrays to the results from the previous data slices, making sure to add the results for each value (j,k) to the results for the same value of (j,k) from the previous slice.

Aren’t we making this a bit more complicated than it has to be? How much memory are we talking about? Wouldn’t it be easier to just figure out the max size along the i dimension, then allocate the entire DataArr based o that max length and just fill it as you go along?

If I understand correctly, you are generating an array of size Nj, Nk, ?, 3 where the ? is the length of the i dimension that we are not sure about. Then for each [j,k] you are filling a subset of the rows of ?, 3 dimensional sub-array. If that’s correct, I think it’s probably easiest to avoid all the changing array dimensions in the loop (unless the array is really big).

No hendri54, unfortunately the problem cannot be simplified, these loops are what I need to do, I just need to figure out the syntax.

All of the arrays you see here are NOT what is filling up my computer memory. The “Big Calculation” I referred to is a massive computer-filling data set that cannot be computed all at once. All of the arrays I’ve written out here are just a secondary analysis of that data set.

The massive data set must be cut into several pieces, then for each piece I calculate an Nx3 resulting array (a much smaller summary result compared to the whole data set), then I read in the next piece of data (which replaces the earlier piece in memory), then analyze again, etc. At the end I have to glue the results for all of the data slices back together. And parameter j can take on Nj different values, parameter k can take on Nk different values, so the analysis of each data slice must be done independently for (Nj)x(Nk) different parameter combinations.

Nj = 4
Nk = 3
DatArr = [ElasticMatrix{Float64}(undef, 3, 0) for j = 1:Nj, k = 1:Nk]
for i in 1:5, j in 1:Nj, k in 1:Nk
	append!(DatArr[j,k], rand(3, rand(1:10)))
end

Umm… this might work for appending random numbers (the sample task I illustrated the loops with), but in reality I broke the problem into nested loops because several whole series of tasks needs to be done inside each loop.

For example, for each new value of i (say, i = 3), a new chunk of data must be calculated (which can take an hour or three), and the i=2 data slice is discarded from memory and replaced by the i=3 slice. Then, for each particular value of (j,k), a whole bunch of functions must be called with those parameters to get results from this i=3 data slice.

Basically, the nested loop structure cannot be collapsed or folded or simplified for the real calculation problem. I just need to figure out how to glue the new arrays to the pre-existing arrays from earlier data slices.

Recalling my initial post, this is basically the thing I need to do (but correctly):

DatArr = Array{Float64,4}(undef, Nj, Nk, 0, 3)
for i in 1:10
     for j in 1:Nj
          for k in 1:Nk
               [x(i,j,k) y(i,j,k) z(i,j,k)] = {...big calculation...}
               DatArr[j,k,:,:] = vcat(DatArr[j,k,:,:], [x(i,j,k) y(i,j,k) z(i,j,k)])
          end
     end
end