Julia vs NumPy broadcasting

mikmoore · July 21, 2025, 5:20pm

Obviously it depends. It is possible to use better parallelism on a larger problem, but you need a larger problem and a sophisticated implementation to do so.

If BLAS has an efficient and standardized batched multiply, then LinearAlgebra (or at least a package) could expose it in Julia as well (and already would, presumably). But otherwise I wouldn’t necessarily expect batched multiplies to do much more than internalize the obvious loop.

A GPU is a different story, but then you’re already into packages as far as Julia is concerned. And I’m not sure that NumPy would use a GPU without additional configuration either. And a GPU implementation needs either pre-loaded GPU data or a rather large matrix to be efficient in any case.

No idea where any ecosystem stands on this. The nice thing about a batched multiply frontend is that you can change it to a faster implementation later even if it’s not clever now.

Topic		Replies	Views
Speeding up matrix exponential and matrix multiplication Performance question , speed-optimization , matrix	58	2088	August 1, 2023
Performance of `exp(A)` for 9x9 anti-Hermitian matrix: Julia vs. PyTorch vs. MATLAB (CPU & GPU) Performance question , performance	29	1295	August 28, 2025
Numpy 10x faster than Julia ?! What am I doing wrong ?! [solved - julia faster now] Performance question	37	11230	October 15, 2019
Fastest way to calculate eigenvectors of 4x4 matrix Specific Domains linearalgebra , numerics	21	987	July 13, 2025
Extending broadcasts to matrix exponentials? General Usage proposal , broadcast , matrices	7	620	June 8, 2022

Julia vs NumPy broadcasting

Related topics