Broadcasting inconsistency between addition and multiplication

pagalguy · October 8, 2021, 1:49pm

There seems to be inconsistency in broadcasting.

A = [1 2 ; 3 4] 
A* 2   # Broadcasting works here
A.*2   # Broadcasting same as above

A + 2 # Broadcasting fails here. If it works for * then it should work for all operators
A .+2 # Broadcasting works

Oscar_Smith · October 8, 2021, 1:52pm

Vector*scalar is one of the fundamental operations of a vector space. vector+scalar isn’t mathamatically consistent.

pagalguy · October 8, 2021, 2:03pm

Thank you @Oscar_Smith. What are the fundamental operations of vector space? I know about vectors in Physics. But in programming world vector is only collection of same datatype. Do you mean vectors rule in Physics are applied in Julia?

IlianPihlajamaa · October 8, 2021, 2:12pm

Even so, wouldn’t it be good to be able to do array + scalar with a simple +, since it is convenient, unambiguous, and in many cases, arrays are not used as mathematical vectors but more like lists of numbers anyway?

jling · October 8, 2021, 2:14pm

I assume you mean you think of a vector as an arrow with length (in “flat” space). The idea is “multiplication with a real number” is well defined for vector in a vector space.

In intro linear algebra class, they’d say:

A vector space is closed under multiplication with real number.

Because a vector [10, 0, 0] mathematically well defined as the result of 10 * [1, 0, 0] where [1,0,0] can be thought of a unit vector in x.

However, addition between a vector and a scalar is not well defined in math, for reasons that will be clear if you touch a bit group/ring theory.

mbauman · October 8, 2021, 2:19pm

The crux is that * and + alone don’t broadcast. There is no broadcasting going on. So remove that, and the questions become:

Does it make sense to double a matrix? Absolutely, and it happens to be defined as doubling all elements.
Does it make sense to add two to a matrix? Well, there’s a question here: are you adding 2\mathbb{I} (that is, adding twos to the diagonal)? Or 2 everywhere? So that’s why it’s an error.

Rudi79 · October 8, 2021, 2:20pm

Please don’t. Sacrificing mathematical rigor just to spare one keystroke in the rare case of needing this, just does not feel quite justified.

StefanKarpinski · October 8, 2021, 3:14pm

Don’t worry, we won’t, since it violates associativity of +:

Nathan_Boyer · October 8, 2021, 3:15pm

To further clarify, Julia only broadcasts when you specifically ask for it with the dot. Neither A*2 nor A+2 are doing any broadcasting. They are performing those operations as they are defined in math (and physics).

In the case of A+2, that operation is not formally defined, so it is not allowed. Since there are multiple results a user might expect for A+2 as pointed out by @mbauman, it is better for Julia to throw an error and force the user to be explicit.

If you want to add 2’s to all of the elements, you can either broadcast or create a matrix of 2’s for the matrix addition as below.

julia> A = [1 2 ; 3 4]
2×2 Matrix{Int64}:
 1  2
 3  4

julia> B = fill(2, (2,2))
2×2 Matrix{Int64}:
 2  2
 2  2

julia> A+B
2×2 Matrix{Int64}:
 3  4
 5  6

goerch · October 8, 2021, 3:25pm

It gets even crazier: did you notice 2A vs A2 already?

Mason · October 8, 2021, 3:57pm

The fact that scalar multiplication doesn’t use broadcasting also means that it doesn’t get to benefit from loop fusion, which can be important for performance when you’re chaining operations.

julia> using BenchmarkTools

julia> let A = rand(100, 100), B = similar(A)
    @btime $B .= 2  * $A .+ 1
    @btime $B .= 2 .* $A .+ 1
end;
  6.234 μs (2 allocations: 78.20 KiB)
  1.915 μs (0 allocations: 0 bytes)

Dots are good! You want more of them, not less.

StefanKarpinski · October 8, 2021, 4:19pm

I’d also like to address the idea that A + 1 unambiguously means “add one to every element of A” when A is a matrix. That’s simply not mathematically correct. In fact, the only natural mathematical definition of A + 1 is to add the identity matrix to A. The tersest reason for this is that the identity matrix is the unit value of matrices (whereas a matrix full of ones is not algebraically meaningful at all). But to expand on that since it may not sound sufficiently compelling, this meaning allows extending polynomial expressions that work on scalars to square matrices. This isn’t a mere gimmick: when you extend functions from scalars to matrices, that is exactly how you do it. For example, the exponential function exp(x) is mathematically defined as:

\exp(x) = 1 + x + \frac{x^2}{2!} + \frac{x^3}{3!} + \cdots

To extend exp to square matrices, we take the exact same definition and apply it to a matrix A instead of a scalar x:

\exp(A) = 1 + A + \frac{A^2}{2!} + \frac{A^3}{3!} + \cdots

What does 1 + A mean here? It does not mean “add one to every element of A.” If you do that, you won’t compute the right function at all. It does mean “add the identity matrix to A.” So not only does 1 + A = A + 1—we’re not going to jettison commutativity of +, right? right?!—not unambiguously mean “add one to every element of A”, it unambiguously means something else! The aforementioned associativity failure for + is a direct result of this: the natural definition of A + 1 is the unique one where associativity works.

Not coincidentally, languages where A + 1 does not have the natural mathematical meaning, it is also typical that exp(A) doesn’t have its natural mathematical meaning either. The natural meaning of exp(A) for a matrix A is to apply the above Taylor series definition to the matrix A and compute that. Instead, Matlab and Python apply the scalar exp function to each element of A. While that’s an occasionally useful operation that one might want to do, it’s not the right mathematical meaning for taking the exponential of a matrix. Why do they do this? Basically because they adopted vectorization everywhere before realizing (or caring?) that it clashes with mathematics. Since vectorization is necessary for performance in those languages, it won out over mathematical correctness.

In Julia, on the other hand, exp(A) has its natural mathematical meaning and I’d really love to give A + 1 its natural meaning too, but we’ve held off because people would definitely write A + 1 thinking it means what it means in Matlab and Python and get bitten by the fact that it doesn’t. But we can at least not give a mathematically wrong definition. Julia doesn’t need vectorization to get performance and there’s a dedicated syntax for broadcasting that gives all the convenience of vectorization but with improved performance (because of broadcast fusion), which also generalizes to all scalar functions, not just the ones that someone wrote explicitly to apply to all dimensions and kinds of arrays.

Nathan_Boyer · October 8, 2021, 4:42pm

Yes, Julia is often used for scientific programming, so it tries to respect the rules of the sciences.

Let’s look at the addition example using 2D position vectors from physics. You mentioned that you have some experience with these.

Code

using Plots
v0=[0,0]
v1=[1, 2]
v2=[-1, 1]
v3=v1+v2
plot([v0[1],v1[1]], [v0[2],v1[2]], arrow=true, linewidth=3, annotations=tuple(v1+[0.3,0.0]...,"[1,2]"), label="v1")
plot!([v0[1],v2[1]], [v0[2],v2[2]], arrow=true, linewidth=3, annotations=tuple(v2+[-0.3,0.0]...,"[-1,1]"), label="v2")
plot!([v0[1],v3[1]], [v0[2],v3[2]], arrow=true, linewidth=3, annotations=tuple(v3+[0.3,0.0]...,"[0,3]"), label="v1+v2")
plot!(xlabel="East-West Position", ylabel="North-South Position", aspect_ratio=:equal, legend=:topleft, legendfont=14, guidefont=16, tickfont=14)

PositionVectors

In this example, you start at position [0,0]. You walk 1 block East and 2 blocks North represented by the vector [1,2]. You then walk 1 block West and 1 block North represented by the vector [-1,1]. Your final position is the sum of your individual changes on position [1,2] + [-1,1] = [0,3].

Now let’s say you are at position [0,3] and want to walk 2. What does that mean? 2 in which direction? North? East? Both? [0,3] + 2 = ? You can see from this example that scalar 2 is not enough information for Julia to know what you mean.

On the other hand, [0,3] * 2 pretty clearly means that you want to make that same move twice, so the answer to multiplying by scalar 2 is well defined as [0,3] * 2 = [0,6].

rafael.guerra · October 8, 2021, 4:46pm

If A is 2x2 and A + 2 == A + [2 2; 2 2] then 2 would be equal to [2 2; 2 2]…
Not nice.

Topic		Replies	Views
Understand vectorized dot operators New to Julia broadcast , broadcasting	8	2478	March 21, 2023
Broadcasting across a nested array General Usage broadcasting	47	4070	March 14, 2019
Addition and UnitRange General Usage question	5	407	February 13, 2020
Array + scalar deprecation General Usage	6	830	November 21, 2017
Adding large matrix, why .+ is not the default? Performance question , array	3	508	August 10, 2020

Broadcasting inconsistency between addition and multiplication

Related topics