Vcat broadcast to all columns?

Oh, BTW, this is even without using multi-threading, which the BLAS-enabled standard matrix-matrix product uses.

I have a few comments about your proposal for additional functionality:

  1. The “General Usage” category is normally used for here and now solutions to peoples problems, proposals for changes to Julia are better to post in the “Internals” category or in the issue tracker on Github
  2. A lot of people have ideas for improving Julia and many post them, the core developers have to be very very selective just for the sake of keeping the language clean and and for the good proposals they have to select the few that are the best use of their limited resources unless the person that proposes them implements them him or herself.
  3. My own opinion about your proposal is that it does not really feel right. It is mixing two functionalities into one call that should not be mixed in that way. I would prefer changing the resize function to be able to handle multi dimensional arrays, but this has already been proposed long ago and Stefan Karpinsky was not in favor of this See this thread

The choice is yours, but I would suggest looking in external packages for the functionality that you desire or use some of the other suggestions in this thread. You are not alone in wishing for more functionality than Julia has to offer.

In this particular case, FillArrays will help (with this PR merged)

julia> using FillArrays

julia> using BenchmarkTools

julia> a = randn(2, 1000);

julia> b = ones(1, 1000);

julia> @benchmark vcat($a, $b)
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.202 μs … 229.327 μs  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     5.460 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   6.317 μs ±   7.687 μs  ┊ GC (mean ± σ):  5.85% ± 5.15%

  ▅█▇▂▁▂▁                                            ▁▂       ▂
  ████████▇▇▇▇▆▅▆▅▄▅▅▄▄▄▁▄▄▃▄▄▁▁▁▁▃▃▁▁▁▁▃▄▁▁▁▁▁▁▄▃▃▃██████▇█▇ █
  5.2 μs       Histogram: log(frequency) by time      13.9 μs <

 Memory estimate: 23.48 KiB, allocs estimate: 2.

julia> @benchmark vcat($a, ones(1, 1000))
BenchmarkTools.Trial: 10000 samples with 5 evaluations.
 Range (min … max):  6.028 μs … 222.245 μs  ┊ GC (min … max): 0.00% … 93.52%
 Time  (median):     6.405 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   7.150 μs ±  10.099 μs  ┊ GC (mean ± σ):  7.95% ±  5.51%

   ▃▆▇███▆▅▄▂▁▁▁ ▁                                            ▂
  ▇██████████████████▇▇▇▇▆▇▇▇▆▆▇▆▆▅▅▅▆▅▃▅▅▆▆▆▅▃▄▁▅▁▄▄▅▅▅▅▅▆▅▅ █
  6.03 μs      Histogram: log(frequency) by time      10.1 μs <

 Memory estimate: 31.42 KiB, allocs estimate: 3.

julia> @benchmark vcat($a, Ones(1, 1000))
BenchmarkTools.Trial: 10000 samples with 6 evaluations.
 Range (min … max):  5.079 μs … 157.727 μs  ┊ GC (min … max): 0.00% … 94.85%
 Time  (median):     5.321 μs               ┊ GC (median):    0.00%
 Time  (mean ± σ):   5.758 μs ±   7.135 μs  ┊ GC (mean ± σ):  6.80% ±  5.26%

       ▂▅▇█▇▄▁                                                 
  ▁▂▄▅████████▆▄▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  5.08 μs         Histogram: frequency by time        6.76 μs <

 Memory estimate: 23.58 KiB, allocs estimate: 5.

Using FillArrays won’t allocate the intermediate array, and should be as fast as broadcasting a scalar. This isn’t a general solution, though, as StaticArrays will require specializing.

1 Like

If you are working with a vector of SVectors, you can easily add 1 to each ‘row’ like this:

julia> V = rand(SVector{3, Float64}, 3)
3-element Vector{SVector{3, Float64}}:
 [0.5162102972196262, 0.8749945861687108, 0.2020173082460679]
 [0.35930646805196187, 0.34349906244827344, 0.9787711385248581]
 [0.7963656080314048, 0.591520175174824, 0.7859901545447247]

julia> V1 = push.(V, 1)
3-element Vector{SVector{4, Float64}}:
 [0.5162102972196262, 0.8749945861687108, 0.2020173082460679, 1.0]
 [0.35930646805196187, 0.34349906244827344, 0.9787711385248581, 1.0]
 [0.7963656080314048, 0.591520175174824, 0.7859901545447247, 1.0]

and then afterward pop.(V1). Combining this with reinterpret/reshape you get a fast “row-adder”.

But it’s both cleaner and faster to just do push/pop for each individual point, and wrap those in a single-point transformation operation. That way you will avoid allocating an unnecessary extra (possibly large) array.

1 Like

In fact it is quite slow.
Put like this is much better …

hcat(push!.(collect.(eachcol(a)),1)...)

and if it were possible to do the push directly! without collecting it would be even better

julia> hcat(push!.(eachcol(a),1)...)
ERROR: MethodError: no method matching resize!(::SubArray{Float64, 1, Matrix{Float64}, Tuple{Base.Slice{Base.OneTo{Int64}}, Int64}, true}, ::Int64)

Thanks to @rafael.guerra ra for mentioning paddeview that I didn’t know (I tried to read the code to figure out which strategy it uses, but I didn’t understand much. can anyone explain the principle of operation in a simple way?).
Below is a comparison on the use case a = rand (2,1000)

julia> using PaddedViews

julia> @btime PaddedView(1, a, size(a).+(1,0))
  1.610 μs (23 allocations: 1.16 KiB)
3×1000 PaddedView(1.0, ::Matrix{Float64}, (Base.OneTo(3), Base.OneTo(1000))) with eltype Float64:
 1.10349   -0.311575  -0.646722  …   1.07844  -1.14549
 0.565721   1.88333    2.41329      -2.45372   0.61512
 1.0        1.0        1.0           1.0       1.0

julia> @btime collect(PaddedView(1, a, size(a).+(1,0)))
  5.520 μs (25 allocations: 24.64 KiB)
3×1000 Matrix{Float64}:
 1.10349   -0.311575  -0.646722  …   1.07844  -1.14549
 0.565721   1.88333    2.41329      -2.45372   0.61512
 1.0        1.0        1.0           1.0       1.0

julia> function appendones(m)
           M=ones(eltype(m),size(m).+(1,0))
           (@view M[1:end-1,:]) .=  m
           M
       end
appendones (generic function with 1 method)

julia> @btime appendones($a)
  3.000 μs (2 allocations: 23.48 KiB)
3×1000 Matrix{Float64}:
 1.10349   -0.311575  -0.646722  …   1.07844  -1.14549
 0.565721   1.88333    2.41329      -2.45372   0.61512
 1.0        1.0        1.0           1.0       1.0




julia> @benchmark vcat($a, ones(1, 1000))
BenchmarkTools.Trial: 10000 samples with 7 evaluations.
 Range (min … max):   3.686 μs … 964.600 μs  ┊ GC (min … max):  0.00% … 97.65%
 Time  (median):      9.286 μs               ┊ GC (median):     0.00%
 Time  (mean ± σ):   11.359 μs ±  27.638 μs  ┊ GC (mean ± σ):  15.21% ±  6.74%   

   █        ▁
  ▂██▃▃▃▂▃▅▇█▄▂▂▂▂▂▃▂▂▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁ ▂
  3.69 μs         Histogram: frequency by time         44.4 μs <

 Memory estimate: 31.42 KiB, allocs estimate: 3.

Firstly, thanks for answer. It’s probably very performant.

However, code like this is obtuse. You need a high level of knowledge of the language to both write it, and to understand what is going on. The question is who you think Julia is aimed at - programmers, or scientists.

Some quotes from the JuliaLang homepage:

Julia is designed from the ground up to be very good at numerical and scientific computing.
Julia is dynamically typed, feels like a scripting language, and has good support for interactive use.
Julia was designed from the beginning for high performance.

These are the reasons I’m trying out Julia right now. There is a good correspondence between what is written in a paper and written in code, it’s fast to develop in, and it’s fast to run. I’d like to write code that is as close to the written equations as possible. If I have to do a function call where in the equations there’s matrix multiplication, for me that’s obfuscating what’s going on.

Well, I thought trafo(M, v) = pop(M * push(v, 1)) was super-readable, elegant, and very much in the Julia spirit, unlike the vcat approach (the vcat solutions were fine, but I think it’s the wrong approach).

If writing in a scientific style means Matlab style, then we’ll have to agree to disagree, but you may not get that much out of the language transition. Matlab is actually hard to beat for Matlab-style numerical programming.

BTW: I assume you meant ‘obscure’ instead of ‘obtuse’. No offense taken :wink:

One more thing, though: You mentioned that you were using, or inteded to use, StaticArrays. How do they fit into the broadcasted vcat approach, it’s a bit unclear to me.

Hi @GHTaarn

A few comments about your comments:

  1. Thanks for the tip. Unfortunately, I can’t post messages with the “Internals” label yet. Please note that “normally used for” and “for” are not the same thing.
  2. I’m sure they do. I’m just not sure how this adds to the discussion at hand. It seems to be trying to shut it down, along with your final paragraph.
  3. I don’t see that it’s mixing two functionalities. I see it as changing the functionality. A bit like when MATLAB changed the functionality of .* from element-wise multiplication to singleton dimension expanding, element-wise multiplication. I note that plenty of Julia functionality already expands singleton dimensions by default anyway, e.g. .*. That isn’t considered mixing two functionalities. You suggest something quite different with resize. You cannot set the padding value at present, and it would not allow concatenation with longer arrays, such as [1, 0].

I am certainly taking note of all the suggestions made in this thread. I like @rafael.guerra’s suggestion of PaddedView, though needing to call collect() if the output should be mutable is problematic. I also like @jishnub’s suggestion to use FillArrays, but as he says, it still needs specializations for StaticArrays. That’s the problem with package solutions in general - lack of interoperability. This is why I think it’s better if the base functionality is more flexible.

I think writing in a scientific style is writing things in code as they are written in a scientific paper. I don’t think writing x[end+1,:] = 1 (from MATLAB) is very scientific, so I’m not equating MATLAB with scientific. I think that writing [x;1] is more scientific. However, that might be confusing, so vcat(x, 1) seemed like a good compromise. So for a transformation of coordinates, I’d write something like proj(M * vcat(x, 1)).

I don’t think your proposal is inelegant. I’m coming round to it. I just think it’s more difficult to parse what’s going on, since there is a function call instead of some simple maths. Sorry, that’s what I meant by obtuse, but it’s the wrong word. Obscure is better, thanks. Also, it requires all coordinates to be in a vector of vectors, whereas on paper it’s just a matrix.

If my proposed change vcat were implemented in the base implementation, it would also need to be implemented in StaticArrays. Then I wouldn’t need any specializations.

I personally prefer the “list of points” way of thinking over the matrix way. It seems closer to the underlying task. That’s what you actually want, right, to apply a transformation to a list of points? The matrix-matrix formulation is actually a roundabout way of describing it, where a matrix is implicitly taken to mean ‘one point per column’, so less direct, so to speak.

But I don’t see how StaticArrays would come into that. I mean, StaticArrays are only intended and usable for small arrays. It could maybe handle up to 30-50 (3-4-dimensional) points before performance starts to suffer badly. It looked like you were jamming in a 1000 points per matrix, which is simply not going to be feasible.

If you must have a matrix to represent your points, maybe HybridArrays.jl could be of interest. They are basically camouflaged arrays of staticvectors, I believe.

In the case that I were using StaticArrays, I’d be using a smaller number of points, between 1 and 16.

Julia being a column-major language, adding a row is an inefficient operation. Creating easy syntax to do non-performant things will become a trap to new users.

A scalar-broadcasting syntax for hcat wouldn’t have that problem though. It would create an asymmetry between hcat and vcat, but that’s probably a good thing, since it emphasizes the asymmetry that already exists between adding new columns vs adding new rows.

But it’s not a given that every Julia AbstractArray is column major. Transposed matrices, for example.

The functionality of vcat - which is what you are basically calling non-performant - already exists. I’m suggesting making a particular use of it more performant. In my view, the language shouldn’t be defined by the underlying data storage. In fact, the opposite should be true. And new users will always do non-performant things. If they want to make code performant, they can profile and optimize.

1 Like

Both of those are true, and points I hadn’t considered.

Performance-wise, the problem seems to be that ones is allocating, and there isn’t a non-allocating version. Something like Iterators.repeated(one(eltype(A)), size(A, 2))) could be an option, but that’s seen as a column vector, and neither permutedims nor PermutedDimsArray works with the return type of that.

In terms of interface, a broadcast syntax would be nice to have - PermutedDimsArray(Iterators.repeated(one(eltype(A)), size(A, 2)))) is a mouthful even if it worked. At this point though, something like needs both an Issue and a PR proposing it, to be discussed seriously for addition to the language within a reasonable time frame.

One objection to a PR to include such a behavior is that it’s not obvious if the 1 to be concatenated stands for a row/column of 1s, or an identity matrix. Eg. in a block matrix-vector operation, an expression like Mx + y may be expressed as [M one(M)]*[x,y]. This seems quite similar to [M 1]*[x,y], and it might be confusing if the 1 here represented a column of ones rather than the identity matrix.

That being said, I am all for such a syntax, and for easier block concatenation that reads like [A 0; 0 B] instead of cat(A, B, dims=(1,2))

Frankly, I don’t think there is all that much to gain, theoretically. If w is a preallocated vector of ones, then [A; w] is going to be pretty close to optimal. A manual loop improves on it a bit. If you must create the vector of ones, that’s a bit extra. But the final gain is going to be limited.

If this requires a nonobvious notation, with limited gain, I’m not sure it’s worth it. And, to be honest, if one is regularly vcating rows into matrices, then performance probably isn’t top priority.

I am not against what you propose, but unlike Matlab, Julia has opted to make a clear distinction between scalars and matrices, such that that notation would be somewhat confusing. Some sort of explicit broadcast operation would be needed to make it more consistent with the language in general, I feel.

But I think you should open an issue in the Julia github repo. Maybe this has been discussed somewhere else and we are missing the reasons for the final syntax, and other peopel may chime in. You can even try a PR for that. Sometimes there are corner cases which prevent the feature to be implemented, sometimes it is just simply that no one took the time to do it.

I think expanding singleton dimensions is actually really obvious and intuitive, and should be supported wherever possible. But I also see that this view is definitely comes from a MATLAB way of writing code, and won’t lead to the best performance in Julia. However, that doesn’t mean that in some cases it won’t be useful, and I think that making a language more flexible increases its power in ways we can’t predict.