Why is multiplication ABC left-associative (`foldl`) not right-associative (`foldr`)?

dlfivefifty · November 15, 2018, 8:52am

This behaviour is a bit surprising:

julia> A = randn(50,50); x = randn(50);

julia> @btime A*A*x;
  10.064 μs (3 allocations: 20.17 KiB)

julia> @btime A*(A*x);
  736.336 ns (2 allocations: 1.06 KiB)

julia> @btime (A*A)*x;
  11.066 μs (3 allocations: 20.17 KiB)

I’m curious on the design choice, as matrix * vector is much more common than say vector’ * matrix.

jling · November 15, 2018, 8:57am

Not only that the current behavior is slower, and it also allocates more memory?

mauro3 · November 15, 2018, 9:10am

(A*A)*x uses more memory because it creates a temporary matrix, vs A*(A*x) only creating a temporary vector.

mohamed82008 · November 15, 2018, 9:13am

https://github.com/AustinPrivett/MatrixChainMultiply.jl

https://github.com/LMescheder/DynMultiply.jl

Tamas_Papp · November 15, 2018, 9:14am

If I remember correctly, * is left associative in a lot of languages (eg C/C++). I think that the best strategy is not to depend on precedence tables too much, and explicitly group subexpressions if this could affect performance.

dlfivefifty · November 15, 2018, 9:58am

Yes I normally just use parentheses, but I have a use case where there is a need for long sequences of operations, which requiring parentheses is error prone.

I think this is worth revisiting in Julia v2 as in the context of linear operators only right associative makes sense…

yuyichao · November 15, 2018, 10:05am

No, it’s dimension dependent

mschauer · November 15, 2018, 10:05am

I also make this mistake at times. But honestly I’d say it is easier to find an array multiplication performance bug because of the left associativity than a subtile precision bug caused by porting left-associtive numeric code from say c to a right-associative version of julia.

dlfivefifty · November 15, 2018, 10:56am

This only applies to matrices. When talking about linear operators, there only exists operator * vector, so left-associative is impossible: (operator * operator) * vector will break while operator * (operator * vector) works fine.

So if one is going to make an arbitrary choice, why not make the arbitrary choice that’s actually more generally useful?

yuyichao · November 15, 2018, 11:22am

No, there’s also left vector.

dlfivefifty · November 15, 2018, 11:31am

No such thing as a “left vector” for linear operators: Linear map - Wikipedia

There are of course functionals, but they are not as fundamental as operator * vector operations. And these are defined as functional * vector, not functional * operator.

yuyichao · November 15, 2018, 11:50am

Sorry I mean row vectors. (Left vector comes from what I usually say in quantum mechanics context…)

So,

It’s not an arbitrary choice. It’s the associativity defined in math and it’s what implemented in every other languages that I know of and it’s what people relies on for non-associative multiplications (floating point for example)
Right associative is only marginaly better, i.e. for matrix * column vector. It breaks everyones expectation in basically all other context and it doesn’t make sense to do it this way.
It’s of course useful to have fast matrix multiplication that is done in the right order, as long as you don’t do this by changing the operator associativity, you can do whatever you want to implement it. (packages, or change the implementation of *(args...) in Base for matrices are all acceptable solutions since none of them will affect unrelated cases)

dlfivefifty · November 15, 2018, 11:59am

Definitely not: ABCx when A, B, C are linear operators acting on a function space and x is in the function space is defined as, and only makes sense as, A(B(Cx)).

The current behaviour breaks everyone’s expectations who come from a linear algebra / functional analysis background.

foobar_lv2 · November 15, 2018, 1:39pm

It really doesn’t. Nobody in his right mind would ever write multiplication without parentheses when the operation is not associative. Nobody in his right mind would use a symbol resembling * or \circ to refer to a non-associative operation; using infix notation is already highly questionable.

Now that we established that we are dealing with a mathematically associative operation, we need to ask how to implement this. And the most efficient way does indeed depend on dimensions: Left-associative is more efficient for rand(2,4)*rand(4,4)*rand(4,4) (qm: bra-operator-operator) and right-associative is more efficient for rand(4,4)*rand(4,4)*rand(4,2) (qm: operator-operator-ket).

We already have \circ = ∘ for a right-associative multiplication with the correct intuition. What about changing the current definition to only dispatch on <:Function and create a new dispatch for AbstractMatrix and AbstractVector? (I’d still like an ascii infix for that, like e.g. :\circ which is currently a parsing error)

dlfivefifty · November 15, 2018, 2:01pm

Nobody in his right mind would ever write multiplication without parentheses when the operation is not associative

Agreed, but here we are talking about the associative case, what order to apply the operations.

I think this argument is really highlighting a divide between analysis and algebra… In functional analysis, talking about multiplication of operators A*B as objects in their own right only makes sense for operator algebras, which are a relatively specialised topic (that doesn’t include, for example, differential operators). The notation L = A*B is pretty much always shorthand for the operator defined by L*x == A*(B*x).

That is, your examples of “bra-operator-operator”, left vector, row vector, etc., do not make sense in the context of general linear operators.

stevengj · November 15, 2018, 2:14pm

In other words, A*B can be perfectly well defined and sensible (and this definition leads to the familiar definition for matrix–matrix multiplication!). In the same way, the product d = r*A of a “row vector” (linear functional) with an operator can be perfectly well defined as the linear functional d*x = r*(A*x) (which also defines an “adjoint” linear operator acting “to the left” on r via Riesz).

In a narrower sense, of course, not everything that can be defined is defined, and programming languages are very precise in this sense. If I define a linear-operator object A::MyOp with a method *(::MyOp, ::Vector), then I may have neglected to define a method *(::MyOp, ::MyOp) for A*B, even if I could have sensibly defined it as the composition of the two operators. (One could argue that I should have included the latter definition in order to make * associative.)

dlfivefifty · November 15, 2018, 2:20pm

Yes, but then L = A*B; L*x becomes equivalent to the right-associative A*(B*x).

My point is that its not “neglect”, but rather then this operation does not need to exist at all.

Though I suppose I’de be happy with keeping the current version of * with the idea that it is for elements in an algebra and having another notation such as ∘ for “applying an operator to a function”, where then A*x and A∘x are equivalent.

stevengj · November 15, 2018, 2:43pm

Yes, but then L = A*B; L*x becomes equivalent to the right-associative A*(B*x) .

That’s the point: * should be defined in such a way as to be associative (up to floating-point errors).

In any case, this discussion seems mostly academic. For reasons that may be lost in the mists of time, Fortran 77 was defined to have left-associative *—perhaps because hand calculations are usually performed left-to-right in the Western world—and nearly all(?) other programming languages have followed suit. Julia adopted this common convention from its beginning, and (since it would be breaking) the associativity cannot be changed until the distant future of Julia 2.0, at which point the associativity will be so ingrained that changing it will be painful and unlikely to be justified by arguments about functional analysis.

(Interestingly, the 1966 ANSI Fortran standard, which was rather short and vague by today’s standards, defines operator precedence but not associativity.)

ChrisRackauckas · November 15, 2018, 3:11pm

@dlfivefifty weren’t you working on lazy matrix multiplication in order to automatically call the extended BLAS functions on things like a*A*B+c? It seems that if * is lazy, this and more general matrix-chain-multiply optimizations could be easily applied.

dlfivefifty · November 15, 2018, 3:19pm

@ChrisRackauckas That’s exactly what brought this up. I had code like this:

L = Mul(A,B) # lazy multiplication
x = randn(5)
L*x

where L*x is implemented via materialize(Mul(A,B,x)) in a style meant to mimic Broadcasted. The only way this makes sense (without completely destroying the point of lazy multiplication) is for L*x to be A*(B*x), but then I realised its inconsistent with Base’s *(A,B,x).

The story gets a bit more complicated because now I also have ContinuumArrays.jl, where A, B are subtypes of AbstractQuasiMatrix, and I’ve implemented *(A,B,C) with this types by calling materialize(Mul(A,B,C)).

Using a different notation for * might be the way to go, though many cases we have things like B'D'D*B for weak Laplacians so it’s nice to be able to write this using shorthand.

Topic		Replies	Views
Compiler matmul optimization prevents constant folding? General Usage constant-propagation , linear-algebra , staticarrays	18	437	December 4, 2023
Vector - Matrix - Vector multiplication Performance	19	4242	March 14, 2021
Order of operations matrix multiplications Performance	5	456	July 5, 2022
Is matrix multiplication not associative? New to Julia question , error , matrix	22	740	March 23, 2025
`foldl` vs `foldr`: matrix multiplication order General Usage question	5	943	November 28, 2017

Why is multiplication A*B*C left-associative (`foldl`) not right-associative (`foldr`)?

Related topics

Why is multiplication ABC left-associative (`foldl`) not right-associative (`foldr`)?