Why is multiplication A*B*C left-associative (`foldl`) not right-associative (`foldr`)?

question

#1

This behaviour is a bit surprising:

julia> A = randn(50,50); x = randn(50);

julia> @btime A*A*x;
  10.064 μs (3 allocations: 20.17 KiB)

julia> @btime A*(A*x);
  736.336 ns (2 allocations: 1.06 KiB)

julia> @btime (A*A)*x;
  11.066 μs (3 allocations: 20.17 KiB)

I’m curious on the design choice, as matrix * vector is much more common than say vector’ * matrix.


#2

Not only that the current behavior is slower, and it also allocates more memory?


#3

(A*A)*x uses more memory because it creates a temporary matrix, vs A*(A*x) only creating a temporary vector.


#4

#5

If I remember correctly, * is left associative in a lot of languages (eg C/C++). I think that the best strategy is not to depend on precedence tables too much, and explicitly group subexpressions if this could affect performance.


#6

Yes I normally just use parentheses, but I have a use case where there is a need for long sequences of operations, which requiring parentheses is error prone.

I think this is worth revisiting in Julia v2 as in the context of linear operators only right associative makes sense…


#7

No, it’s dimension dependent


#8

I also make this mistake at times. But honestly I’d say it is easier to find an array multiplication performance bug because of the left associativity than a subtile precision bug caused by porting left-associtive numeric code from say c to a right-associative version of julia.


#9

This only applies to matrices. When talking about linear operators, there only exists operator * vector, so left-associative is impossible: (operator * operator) * vector will break while operator * (operator * vector) works fine.

So if one is going to make an arbitrary choice, why not make the arbitrary choice that’s actually more generally useful?


#10

No, there’s also left vector.


#11

No such thing as a “left vector” for linear operators: https://en.wikipedia.org/wiki/Linear_map

There are of course functionals, but they are not as fundamental as operator * vector operations. And these are defined as functional * vector, not functional * operator.


#12

Sorry I mean row vectors. (Left vector comes from what I usually say in quantum mechanics context…)

So,

  1. It’s not an arbitrary choice. It’s the associativity defined in math and it’s what implemented in every other languages that I know of and it’s what people relies on for non-associative multiplications (floating point for example)
  2. Right associative is only marginaly better, i.e. for matrix * column vector. It breaks everyones expectation in basically all other context and it doesn’t make sense to do it this way.
  3. It’s of course useful to have fast matrix multiplication that is done in the right order, as long as you don’t do this by changing the operator associativity, you can do whatever you want to implement it. (packages, or change the implementation of *(args...) in Base for matrices are all acceptable solutions since none of them will affect unrelated cases)

#13

Definitely not: ABCx when A, B, C are linear operators acting on a function space and x is in the function space is defined as, and only makes sense as, A(B(Cx)).

The current behaviour breaks everyone’s expectations who come from a linear algebra / functional analysis background.


#14

It really doesn’t. Nobody in his right mind would ever write multiplication without parentheses when the operation is not associative. Nobody in his right mind would use a symbol resembling * or \circ to refer to a non-associative operation; using infix notation is already highly questionable.

Now that we established that we are dealing with a mathematically associative operation, we need to ask how to implement this. And the most efficient way does indeed depend on dimensions: Left-associative is more efficient for rand(2,4)*rand(4,4)*rand(4,4) (qm: bra-operator-operator) and right-associative is more efficient for rand(4,4)*rand(4,4)*rand(4,2) (qm: operator-operator-ket).

We already have \circ = for a right-associative multiplication with the correct intuition. What about changing the current definition to only dispatch on <:Function and create a new dispatch for AbstractMatrix and AbstractVector? (I’d still like an ascii infix for that, like e.g. :\circ which is currently a parsing error)


#15

Nobody in his right mind would ever write multiplication without parentheses when the operation is not associative

Agreed, but here we are talking about the associative case, what order to apply the operations.

I think this argument is really highlighting a divide between analysis and algebra… In functional analysis, talking about multiplication of operators A*B as objects in their own right only makes sense for operator algebras, which are a relatively specialised topic (that doesn’t include, for example, differential operators). The notation L = A*B is pretty much always shorthand for the operator defined by L*x == A*(B*x).

That is, your examples of “bra-operator-operator”, left vector, row vector, etc., do not make sense in the context of general linear operators.


#16

In other words, A*B can be perfectly well defined and sensible (and this definition leads to the familiar definition for matrix–matrix multiplication!). In the same way, the product d = r*A of a “row vector” (linear functional) with an operator can be perfectly well defined as the linear functional d*x = r*(A*x) (which also defines an “adjoint” linear operator acting “to the left” on r via Riesz).

In a narrower sense, of course, not everything that can be defined is defined, and programming languages are very precise in this sense. If I define a linear-operator object A::MyOp with a method *(::MyOp, ::Vector), then I may have neglected to define a method *(::MyOp, ::MyOp) for A*B, even if I could have sensibly defined it as the composition of the two operators. (One could argue that I should have included the latter definition in order to make * associative.)


#17

Yes, but then L = A*B; L*x becomes equivalent to the right-associative A*(B*x).

My point is that its not “neglect”, but rather then this operation does not need to exist at all.

Though I suppose I’de be happy with keeping the current version of * with the idea that it is for elements in an algebra and having another notation such as for “applying an operator to a function”, where then A*x and A∘x are equivalent.


#18

Yes, but then L = A*B; L*x becomes equivalent to the right-associative A*(B*x) .

That’s the point: * should be defined in such a way as to be associative (up to floating-point errors).

In any case, this discussion seems mostly academic. For reasons that may be lost in the mists of time, Fortran 77 was defined to have left-associative *—perhaps because hand calculations are usually performed left-to-right in the Western world—and nearly all(?) other programming languages have followed suit. Julia adopted this common convention from its beginning, and (since it would be breaking) the associativity cannot be changed until the distant future of Julia 2.0, at which point the associativity will be so ingrained that changing it will be painful and unlikely to be justified by arguments about functional analysis.

(Interestingly, the 1966 ANSI Fortran standard, which was rather short and vague by today’s standards, defines operator precedence but not associativity.)


#19

@dlfivefifty weren’t you working on lazy matrix multiplication in order to automatically call the extended BLAS functions on things like a*A*B+c? It seems that if * is lazy, this and more general matrix-chain-multiply optimizations could be easily applied.


#20

@ChrisRackauckas That’s exactly what brought this up. I had code like this:

L = Mul(A,B) # lazy multiplication
x = randn(5)
L*x

where L*x is implemented via materialize(Mul(A,B,x)) in a style meant to mimic Broadcasted. The only way this makes sense (without completely destroying the point of lazy multiplication) is for L*x to be A*(B*x), but then I realised its inconsistent with Base’s *(A,B,x).

The story gets a bit more complicated because now I also have ContinuumArrays.jl, where A, B are subtypes of AbstractQuasiMatrix, and I’ve implemented *(A,B,C) with this types by calling materialize(Mul(A,B,C)).

Using a different notation for * might be the way to go, though many cases we have things like B'D'D*B for weak Laplacians so it’s nice to be able to write this using shorthand.