Vcat broadcast to all columns?

In MATLAB, if I write x(end+1,:) = 1, it will concatenate a row of ones to the bottom of the matrix x. I haven’t found a better way of doing this in Julia than x = vcat(x, ones(typeof(x[1]), 1, size(x, 2))). An issue is that this requires the construction of an extra array (the ones). It would be nice functionality if I could just write x = vcat(x, 1), and the 1 was broadcast to all columns (note that vcat.(x, 1) turns every element of x into a 2-element vector). This could even be extended to vectors, so vcat(x, [1,0]) would extend every column of the array x by two elements. Obviously this could be applied to hcat and other concatenation functions too. This probably shouldn’t work (in general) for x = [x; 1], and I don’t know if that would cause problems.

Would this be better?

hcat([vcat(x[:,j],1.0) for j in 1:size(x,2)]...)

Is it for performance reasons that you wish to avoid constructing an extra array or is there another reason?

If it’s for readability, then you can avoid the typeof by using fill:

x = vcat(x, fill(1, 1, size(x,2)))

where you make sure that the first argument of fill has the same type as the elements of x.

Yes, mainly performance issues. Also, I’d mostly be doing this on StaticMatrices from the StaticArrays package, and it would be nice not to need this specialization for that: x = vcat(x, ones(SMatrix{1, Size(x)[2]}))

I doubt this would be your bottleneck:

julia> @benchmark vcat(x, ones(typeof(x[1]), 1, size(x, 2))) setup=(x=rand(3,3))
BenchmarkTools.Trial: 10000 samples with 973 evaluations.
 Range (min … max):  71.451 ns …   2.008 ΞΌs  β”Š GC (min … max): 0.00% … 94.48%
 Time  (median):     79.132 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   88.322 ns Β± 113.775 ns  β”Š GC (mean Β± Οƒ):  9.02% Β±  6.65%

  β–‚β–„β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–…β–„β–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–‚β–‚β–β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚ β–ƒ
  71.5 ns         Histogram: frequency by time          128 ns <

 Memory estimate: 240 bytes, allocs estimate: 2.

julia> @benchmark vcat(x, ones(typeof(x[1]), 1, size(x, 2))) setup=(x=rand(10,10))
BenchmarkTools.Trial: 10000 samples with 728 evaluations.
 Range (min … max):  182.529 ns …   1.477 ΞΌs  β”Š GC (min … max): 0.00% … 69.32%
 Time  (median):     206.022 ns               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   229.986 ns Β± 126.871 ns  β”Š GC (mean Β± Οƒ):  6.52% Β±  9.65%

  β–…β–ˆβ–…β–…β–  β–‚                                                      ▁
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–‡β–ˆβ–‡β–†β–„β–ƒβ–β–„β–…β–†β–ˆβ–„β–β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–…β–…β–†β–†β–†β–†β–†β–‡ β–ˆ
  183 ns        Histogram: log(frequency) by time       1.15 ΞΌs <
julia> @benchmark vcat(x, ones(typeof(x[1]), 1, size(x, 2))) setup=(x=rand(20,20))
BenchmarkTools.Trial: 10000 samples with 187 evaluations.
 Range (min … max):  539.519 ns … 10.216 ΞΌs  β”Š GC (min … max):  0.00% … 91.26%
 Time  (median):     655.679 ns              β”Š GC (median):     0.00%
 Time  (mean Β± Οƒ):   815.456 ns Β±  1.017 ΞΌs  β”Š GC (mean Β± Οƒ):  14.82% Β± 10.93%

  β–ˆβ–†β– β–ƒβ–ƒ                                                       ▁
  β–ˆβ–ˆβ–ˆβ–‡β–ˆβ–ˆβ–‡β–„β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–…β–„β–„β–ƒβ–β–…β–†β–‡ β–ˆ
  540 ns        Histogram: log(frequency) by time       8.8 ΞΌs <

 Memory estimate: 3.66 KiB, allocs estimate: 2.

If I do the following:

julia> a = randn(2, 1000);

julia> b = ones(1, 1000);

julia> @benchmark vcat($a, $b)
BenchmarkTools.Trial: 10000 samples with 9 evaluations.
 Range (min … max):  2.917 ΞΌs … 240.338 ΞΌs  β”Š GC (min … max):  0.00% … 98.13%
 Time  (median):     4.083 ΞΌs               β”Š GC (median):     0.00%
 Time  (mean Β± Οƒ):   4.637 ΞΌs Β±   9.276 ΞΌs  β”Š GC (mean Β± Οƒ):  13.57% Β±  6.64%

  β–‡                      β–ˆβ–β–
  β–ˆβ–‡β–ƒβ–‚β–β–β–β–β–β–β–β–β–‚β–„β–‡β–‡β–ˆβ–ˆβ–ˆβ–‡β–†β–…β–ˆβ–ˆβ–ˆβ–ˆβ–…β–„β–„β–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β– β–‚
  2.92 ΞΌs         Histogram: frequency by time        6.31 ΞΌs <

 Memory estimate: 23.48 KiB, allocs estimate: 2.

julia> @benchmark vcat($a, ones(1, 1000))
BenchmarkTools.Trial: 10000 samples with 8 evaluations.
 Range (min … max):  3.104 ΞΌs … 303.979 ΞΌs  β”Š GC (min … max):  0.00% … 98.66%
 Time  (median):     4.588 ΞΌs               β”Š GC (median):     0.00%
 Time  (mean Β± Οƒ):   6.040 ΞΌs Β±  20.000 ΞΌs  β”Š GC (mean Β± Οƒ):  24.80% Β±  7.33%

  β–ˆ             ▂▄▆▇▆▃▁▁▄
  β–ˆβ–„β–…β–…β–„β–ƒβ–‚β–β–‚β–‚β–‚β–ƒβ–…β–‡β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–…β–„β–„β–„β–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–β–‚β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β– β–ƒ
  3.1 ΞΌs          Histogram: frequency by time        8.02 ΞΌs <

 Memory estimate: 31.42 KiB, allocs estimate: 3.

then needing to allocate the array uses 34% memory and 30% more time. This is my typical use case. It’s certainly not a negligible overhead. The function would likely be faster still if replicating a single scalar as opposed to loading from an array, as done here.

julia> @benchmark hcat(vcat.(eachcol($a),1)...)
BenchmarkTools.Trial: 3129 samples with 1 evaluation.
 Range (min … max):  1.485 ms …   4.766 ms  β”Š GC (min … max): 0.00% … 67.92%
 Time  (median):     1.500 ms               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   1.598 ms Β± 548.531 ΞΌs  β”Š GC (mean Β± Οƒ):  6.02% Β± 11.57%

  β–ˆ                                                         ▁
  β–ˆβ–…β–ƒβ–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ˆ β–ˆ
  1.49 ms      Histogram: log(frequency) by time       4.7 ms <

 Memory estimate: 1.33 MiB, allocs estimate: 30009.

Over 300x slower

if you need performance perhaps this is better

function appendones(m)

If you are willing to specialize, maybe something like this:

julia> using StaticArrays

julia> function add_ones(m::SMatrix{N,M,T}) where {N,M,T}
           m2 = zeros(MMatrix{N+1,M,T,(N+1)*M})
           for j in axes(m,2), i in axes(m,1)
               m2[i,j] = m[i,j]
           for j in axes(m2,2)
               m2[N+1,j] = oneunit(T)
           return SMatrix(m2)
add_ones (generic function with 1 method)

julia> m = zeros(SMatrix{2,3,Float64,2*3})
2Γ—3 SMatrix{2, 3, Float64, 6} with indices SOneTo(2)Γ—SOneTo(3):
 0.0  0.0  0.0
 0.0  0.0  0.0

julia> add_ones(m)
3Γ—3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)Γ—SOneTo(3):
 0.0  0.0  0.0
 0.0  0.0  0.0
 1.0  1.0  1.0

julia> @btime add_ones($m)
  8.062 ns (0 allocations: 0 bytes)
3Γ—3 SMatrix{3, 3, Float64, 9} with indices SOneTo(3)Γ—SOneTo(3):
 0.0  0.0  0.0
 0.0  0.0  0.0
 1.0  1.0  1.0

You could check PaddedViews.jl, which adds a padding view to any array without copying data:

using PaddedViews
x = rand(3, 3)
PaddedView(Ο€, x, (4, 3))

 0.736128  0.854871  0.498129
 0.769941  0.36739   0.0790091
 0.807578  0.393201  0.465546
 3.14159   3.14159   3.14159

PaddedView(Ο€, x, (3, 4))

 0.736128  0.854871  0.498129   3.14159
 0.769941  0.36739   0.0790091  3.14159
 0.807578  0.393201  0.465546   3.14159

This should probably be eltype(m) instead.

1 Like

if your typical use case is appending a row into a 1000 row matrix, you’re doing something terrible wrong, btw Julia is column-major, appending column is trivial

1 Like

I see what you mean, this is harder to do in Julia than in Matlab. On the other hand, it’s the sort of thing I rarely do in Julia, but occasionally do in Matlab.

It may be that this is a pattern that is less useful. Perhaps if you describe what you need it for, someone might suggest a different approach.

Also it is possible using ElasticArrays to append columns on the last dimension: GitHub - JuliaArrays/ElasticArrays.jl: Resizeable multi-dimensional arrays for Julia in a efficient way.

(I’m not sure if the OP wants that for large matrices or small static matrices, though).

1 Like

This is nice, though I note that the output of PaddedView is immutable, so it isn’t quite the same.

I use it for creating homogenous coordinates in projective geometry. You often have transformation matrices that operate on homogeneous coordinates.

We can collect the view if we need to assign:

y = collect(PaddedView(Ο€, x, (4,3)))

My intention was to propose an addition to the functionality of vcat (and other concatenation functions): to broadcast scalars across columns. I get that some of you might not see a need for this in your line of work, but it is definitely useful in projective geometry applications. PaddedView is probably good enough for my purpose, though the issues around mutability and having to call collect() make it not as flexible.

To come at my suggestion from a different angle, are there any downsides to vcat broadcasting scalars?

That’s actually somewhat familiar territory, and I’m pretty sure you don’t need vcat at all.

Use StaticArrays all the way, with a SMatrix{4,4,Float64} transformation matrix, and your points, not as 3xN matrices, but vectors of SVector{3, Float64}. Then each transformation can be encoded like this:

trafo(M, v) = pop(M * push(v, 1)) # add a one, and remove it after multiplication
# or maybe
trafo(M, v) = pop(M * push(v, one(eltype(v))))

If V is the vector of SVectors, you can broadcast the operation as

trafo.((M,), V)

In a quick (and somewhat sloppy) test, this is faster even than the pure matrix-vector product, and much faster if you factor in the vcat part, and then perhaps even removing the last row at the end.

It also makes more sense to work with SVectors directly for points instead of matrices. But if you must start with a 3xN for some reason, you can reinterpret it at basically zero cost: reinterpret(SVector{3,Float64}, V).

This is one example of how you often use different approaches in Julia than in Matlab. In Matlab matrix algebra is so much faster than the rest of the language that almost any cost is worth it to move to matrix form, even wastefully stacking matrices on top of each other with poor memory usage.