Null initialisation for a loop that appends vectors

In Matlab, it is common to initialise a loop like this:
x = ; for i = 1:n; x = [x; another numerical row vector (of fixed length L)]; end
This creates a matrix of size [n,L]. It works regardless of chosen length L.

What is the equivalent Julia idiom please? vcat won’t append to an empty value, as far as I can see. I don’t want to have to repeat all the code in the loop to find the first row separately: I just want to append the first row to some sort of null object, if that’s allowed.

It’s a bad pattern both in Matlab and Julia unless you have rather small amounts of data but the key to make it work in Julia is to start with an empty matrix of a compatible size.

julia> x = zeros(0, 5)
0×5 Matrix{Float64}

julia> for _ in 1:3
           x = [x; rand(1, 5)]
       end

julia> x
3×5 Matrix{Float64}:
 0.282795  0.316492  0.385162  0.649601  0.72799
 0.176852  0.16786   0.62178   0.100277  0.296178
 0.564282  0.651074  0.487304  0.71793   0.0336523
2 Likes

If you can’t preallocate the array and simply write the entries then I would do something like this:

x = Matrix{Float64}[] # empty Vector{Matrix{Float64}}
for i in 1:10
    if rand() < 0.5 # maybe don't always add a row
        row = rand(1, 5) # generate a row
        push!(x, row) # save it
    end
end
X = reduce(vcat, x) # concatenate all the saved samples into a single Matrix

thanks, I’ll look at that.
I should have mentioned that preallocation is difficult in my application because the loop count n is not known beforehand: the loop exits when certain conditions are met, but n can be quite large. I believe that Julia is much more efficient at appending than Matlab, which made me think that appending without preallocation might be worth trying. But what if n is getting up to 1e7, 1e8 ?

Often such things can be written M = stack(function, iterator), where the function returns data that will become a column or a row of the final matrix. With a do block:

julia> stack(1:5) do i
         [i/2, i/3, i/5]  # this is a column
       end
3×5 Matrix{Float64}:
 0.5       1.0       1.5  2.0      2.5
 0.333333  0.666667  1.0  1.33333  1.66667
 0.2       0.4       0.6  0.8      1.0

julia> stack(1:5; dims=1) do i
         (i/2, i/3, i/5)  # this will become a row
       end
5×3 Matrix{Float64}:
 0.5  0.333333  0.2
 1.0  0.666667  0.4
 1.5  1.0       0.6
 2.0  1.33333   0.8
 2.5  1.66667   1.0

Edit, but this won’t help if the number of iterations isn’t fixed. In that case, push!-ing into some list is probably what you want. This should be more efficient than re-creating the matrix at every step of the loop.

julia> slices = [];  # ideally you provide the type here, T[], with T = typeof(col)

julia> for i in 1:999  # ideally this is all in a function too!
         i > 5 && break
         col = (i/2, i/3, i/5)
         push!(slices, col)
       end

julia> stack(slices)
3×5 Matrix{Float64}:
 0.5       1.0       1.5  2.0      2.5
 0.333333  0.666667  1.0  1.33333  1.66667
 0.2       0.4       0.6  0.8      1.0
2 Likes

If anything, I would expect MATLAB has taken pains to make repeatedly adding rows to a matrix vaguely performant (naively, it’s a horrific pattern for data movement). I am quite certain Julia has not. So, without having benchmarked it, I would guess it does better (or at least similarly) to Julia for that pattern.

OK, thanks guys, I need to do some reading about your suggestions. Very much a Julia newbie.
Is it OK not to tick any post as the final answer for the time being?

2 Likes

Sure, it’s common to not select an accepted answer, unless there is a very definitive one. Just leave it open, if you like.

1 Like

I wrote 3 versions of this minimal array-creating routine:
i) in Matlab, with pre-allocation of the whole array (m-file, no compilation)
ii) in Julia, creating a Vector of Vectors, then stacking them (as per mcabbott’s post)
iii) in Julia, with pre-allocation of the whole array

The arrays were all Int64, called with size [r,c], and the test function filled
the array using the simplest possible arithmetic (a predefined vector was
incremented by the current loop index, then stored as a row of the array).
This was done in a directly comparable way in Matlab and Julia.

The test routine was called N times for each [r,c] combination shown, and the
minimum & maximum times recorded. The results are:

                      Matlab        Julia with stack    Julia pre-allocated
                 -------------     -----------------    -------------------
 r     c    N     min      max       min       max         min       max
---   ---  ---   ------   ------   -------   -------     -------   -------
1e4   1e2  100   0.0187   0.0221   0.00472   0.08460     0.00273   0.05612
1e5   1e2  100   0.1770   0.2207   0.05759   0.24601     0.05965   0.15521
1e5   1e3  100   0.6768   0.7221   0.52822   1.66571     0.65756   1.06420
1e6   1e3   10   7.3609  10.0585  32.2275   59.42906    15.22840  48.9308

Some conclusions:

  1. Both Julia methods outperform Matlab for smaller arrays.

  2. Both Julia methods take about the same time. But there were consistent minor differences:
    the first method (create a Vector of Vectors then stack them) is a little slower than full array
    pre-allocation for small arrays, and a little faster for big arrays.

  3. Matlab becomes more competitive as the array size increases. Times become
    about the same as array size reaches about 0.75 GiB (on laptop with 16G RAM)

  4. Times in the last line, where array size is 7.45 GiB and required workspace reported by
    @time is twice that, show the sudden drop in performance when free memory is exhausted
    and memory is paging out frequently. Matlab is much better at handling the paging-out.

How are you timing the code in both cases, exactly? If the benchmark code is as simple as you described, go ahead and just paste it verbatim. That might help explain the timings because I wouldn’t expect i to be much different from iii, but again, hard to say without knowing the entire code.

For now, it’s expected that ii behaves differently. If you preallocate the whole matrix and the incremented vector, then those 2 could be all the allocations you ever do (in practice, your code may allocate other things for other reasons). If you’re copying the incremented vector at each iteration to push into a vector of vectors to be concatenated after the loop, the same data was broken up into many distinct vectors before being copied over to the whole matrix at best or being concatenated pairwise for just as many intermediate matrices at worst.

OK Benny, here’s the code. I didn’t post it before as I felt interest would be low.
I am a Julia newbie so tell me if what I have written is non-Julian / suboptimal.

Matlab code (1 fill method):

function test(r,c)
    n = 100;
    T = zeros(n,1);

    for i = 1:n
        clear y;               % clear y array from last pass: not timed
        tic;
        y = test2(r,c);        % fill the array
        T(i) = toc;            % store the time
    end

    disp([num2str(min(T), 4) ' - ' n2s(max(T), 4)]);
end
%_______________________________
function x = test2(r,c)

    x = zeros(r, c, 'int64');              % allocate whole output array, once
    v = [int64(2): int64(2): int64(2*c)];  % predefine a row vector

    for j = int64(1): int64(r)  % use int64 so no cast is needed on next line
        row = j + v;            % increment just enough to distinguish the rows
        x(j,:) = row;           % store the row
    end
end
%_______________________________

Julia code: (2 fill methods)

function test(r::Int, c::Int)

# method 1
y1 = nothing               # so y1 can be inspected outside the `for` loop
n = 100
for i in 1:n
    y1 = nothing           # clear array from last pass               ) not
    GC.gc()                # force garbage collection before starting )  timed

    @time y1 = test1(r,c)  # fill the array                           ) timed
end
println("-------")

# method 2
y1 = nothing               # remove y1 before creating y2: essential as size increases
y2 = nothing
for i in 1:n
	y2 = nothing
	GC.gc()
	@time y2 = test2(r,c)
end

end
#------------------------------

function test1(r,c)
# Using Vector of Vectors, then stack into a Matrix.  No Matlab equivalent.

x = Vector{Int64}[]
v = collect(Int64(2): 2: Int64(2c))

for i in Int64(1): Int64(r)
	col = i .+ v
	push!(x, col)
end

x = stack(x, dims=1)

return x

end
#---------------------------------------

function test2(r,c)
# Using full-size pre-allocated array. Equivalent to Matlab test2.m

x = zeros(Int64, r, c)
v = collect(Int64(2): 2: Int64(2c))

for i in Int64(1): Int64(r)
	col = i .+ v
	x[i,:] = col
end

return x

end
#---------------------------------------

Here you allocate a lot of vectors, i.e. the col in every iteration, it’s better to do:

for i in 1:r
    x[i,:] .= i .+ v
end

This completely avoids the allocations in the loop.
Also, the collect in collect(2:2:2c) is unnecessary, though it probably does not contribute a lot.

I’m not certain what a similar matlab program actually does, but I suspect it’s something like this in julia:

col = Vector{Int}(undef, size(x,2))
for i in 1:r
    col .= i .+ v
    x[i,:] .= col
end

which also avoids allocations in the loop, but has a preallocated col vector and an extra copy.

1 Like

Ah yes, simply substituting the expression for col helps the pre-allocation method noticeably.
Particularly once the array is large enough to cause repeated paging-out of memory.
Performance is then closer to Matlab’s performance: see min time of 8.75s in last row.

			         Matlab         Julia with stack     Julia pre-allocated
                  ------------       --------------        --------------
 r     c    N     min      max       min        max        min        max
---   ---  ---   ------   ------   --------   -------   ---------  --------
1e4   1e2  100   0.0187   0.0221   0.007680   0.009264   0.001924   0.006417
1e5   1e2  100   0.1770   0.2207   0.062677   0.141395   0.023470   0.035607
1e5   1e3  100   0.6768   0.7221   0.564494   0.837645   0.375146   0.449644
1e6   1e3   10   7.3609  10.0585  33.637660  57.911190   8.750952  31.260850

Thanks

Some feedback

  1. The outer test appears to be trying to free all garbage (including the last run’s result) between runs for fresh starts. The MATLAB version’s clear y wouldn’t trigger the GC to immediately free the last result, so the Julia translation is doing something extra. It’s better to use a benchmarking library like BenchmarkTools.jl than trying to make your own; @benchmark test2(10, 10) samples=100 more or less does your benchmark with extra statistics, and the returned object holds the raw timings in the .times field.
  2. test1 seems mostly fine. I’m pretty sure stack is doing the best-case I mentioned, though I can’t rule out some more efficient way of doing it. The allocated memory (we don’t call it workspace, and that’s not what it means in MATLAB either) is roughly twice the array size because you allocated the separate row vectors before allocating and copying to the result matrix. It’s possible to avoid allocating those row vectors separately by using StaticArrays.SVectors instead, but the current implementation is generally slower than Vector at larger sizes.
  3. test2 allocates a separate vector per iteration at col = i .+ v, which doubles the allocated memory like test1 does. It also essentially wastes your preallocation because you don’t directly write the results of the computation to the space in the matrix. x[i,:] .= i .+ v as mentioned above works, and calling that test3, we see a massive improvement:
julia> @time test2(1_000_000, 100);
  0.659842 seconds (2.00 M allocations: 1.609 GiB, 27.25% gc time)

julia> @time test3(1_000_000, 100);
  0.247804 seconds (5 allocations: 762.940 MiB, 0.84% gc time)

julia> all(test2(1_000_000, 100) .== test3(1_000_000, 100))
true

Note that writing to a manual preallocation applies just as much to MATLAB, it could just look different because MATLAB does in-place writes and JIT optimizations in its own way. I’d hazard a guess that your MATLAB test2 is actually doing the equivalent of test3 roughly based on the timings, but I don’t really have a clue.

Improving test1 is trickier than it looks. We could reduce the number of allocations by directly appending elements to a vector instead of pushing vectors to a vector, then we can avoid the last matrix allocation with lazy wrappers (the cost is indexing needs more work).

julia> function test4(r,c)
       x = Int64[]
       v = collect(Int64(2): 2: Int64(2c))
       lengthv = length(v)

       # note that column-major Vector is treated as a column vector
       for i in Int64(1): Int64(r)
           resize!(x, length(x) + lengthv)
           x[end - (lengthv-1):end] .= i .+ v
       end

       # view needed for laziness (no allocation)
       # reshape to column-major horizontal stack of columns
       # transpose to vertical stack of rows
       transpose(reshape(view(x, :), c, r))

       end
test4 (generic function with 1 method)

julia> all(test1(1_000_000, 100) .== test4(1_000_000, 100))
true

julia> @time test1(1_000_000, 100);
  1.068569 seconds (2.00 M allocations: 1.626 GiB, 42.75% gc time)

julia> @time test4(1_000_000, 100);
  0.810941 seconds (44 allocations: 2.965 GiB, 40.79% gc time)

While it saved a bit of time, it actually allocated more because the massive vector needed to be resized to hold more elements. test1 resized its vector of vectors too, but it’s a much shorter vector of smaller pointers with the actual data scattered elsewhere. Preallocation really helps, there’s no getting around that.

Speaking more broadly, 7.45 GiB for a single output matrix is really pushing 16 GiB of RAM. Reconsider how much of this matrix you need in RAM at a time, and note that you have much more storage to hold the rest.

Thanks for your time in analysing this, Benny. As you can see, this is not my area of expertise - I just fell into it when I decided it would be interesting to count the number of valid outcomes in a combinatorial problem. But it is always good to learn more about how any language works, so I’ll go through the info you’ve given me & hopefully I can use it to good effect later.

More RAM would be nice, for sure, but I just have a laptop and have already run into the deliberate obstructions manufacturers put in place regarding battery replacement, so I probably don’t want to go there. All part of planned obsolescence. I’ll get a new one with more RAM eventually …