"I don't like NumPy" - Julia equivalents to the numpy code?

MDSW · May 16, 2025, 1:27am

I came across this post and, not being an expert in either NumPy or the how it would be done in Julia, was curious if anyone here could weigh in with idiomatic Julia versions of some of these examples for comparison.
Note that the author also promises a follow-up piece in which a “better suggestion” will be revealed.

dqeeq · May 16, 2025, 2:04am

Here is how I would do the first three examples

Example 1

Say A is a 5×5 matrix, x is a length-5 vector, and you want to find the vector y such that Ay=x

A = rand(5,5)
x = rand(5)
y = A\x

Example 2

Say A is a stack of 100 5×5 matrices, given as a 100×5×5 array. And say x is a stack of 100 length-5 vectors, given as a 100×5 array. And say you want to solve Aᵢyᵢ=xᵢ for 1≤i≤100

A = [rand(5,5) for i in 1:100]
x = [rand(5) for i in 1:100]
y = [A[i]\x[i] for i in 1:100]

Example 3

Suppose:

A is a K×L×M array

B is a L×N array

C is a K×M array

And say that for each k and n, you’d like to compute the mean over the L and M > dimensions. That is, you want

D_kn = 1/(LM) × ∑_lm A_klm B_ln C_km.

K = 2
L = 3
M = 4
N = 5
A = rand(K,L,M)
B = rand(L,N)
C = rand(K,M)
D = [[1/(L * M) * sum(A[k,l,m] * B[l,n] * C[k,m] for l in 1:L for m in 1:M) for k in 1:K ] for n in 1:N]

karei · May 16, 2025, 2:12am

I don’t like, too.

digital_carver · May 16, 2025, 3:04am

I was actually curious about the best equivalent for this part, because they say:

But nowadays, everything is GPU and if you’ve got big arrays, you probably don’t want to use loops in any language.

That made me curious if Julia breaks the “loops are bad” mold here too. I’m not very familiar with the KernelAbstractions.jl / CUDA.jl side of things though, so I don’t know how much effort it would be to write this in a GPU-friendly way. (A lot of the complaints in the article are about the ergonomics of doing things, stuff being unnecessarily hard, rather than about being able to do it at all.)

They explicitly say “given as a 100×5×5 array”, so a rand(100, 5, 5) would be closer to the spec. (And similarly for x.)

The last example comes out really neat with Tullio.jl or similar, but it’s nice to see that even the native Julia version comes out pretty nice and readable.

Benny · May 16, 2025, 3:22am

I guarantee the “better suggestion” is only “better” because it’s less powerful and can be much worse when scrutinized in more ways. Operations on arrays are always going to be less intuitive than explicit iteration, but the complicated stuff is NOT something you ever want to be writing out a loop for anyway. Taking the example of the (10,2,3,40) shape resulting from a broadcasting of 3-wide and 1x2 indices, I prefer how Julia forces us to acknowledge they are broadcast together size(A[:, CartesianIndex.([1 2 3], [1,2]), :]) because arrays of indices aren’t treated differently from ranges, but let’s be real, do we want to write out the 4-level loop, preallocate the correct array size, and order the indices correctly in either Python or Julia? We almost always don’t need to anyway, and array syntax can intuitively handle the simpler cases just fine. The last example is just a mockery of someone using Einstein summation instead of the “obvious” suboptimal loops.

danielwe · May 16, 2025, 3:24am

This can be simplified to just y = A .\ x. But as @digital_carver pointed out, I think the real challenge here is to come up with a syntax for batched linsolves that would dispatch to batched GPU kernels.

Oscar_Smith · May 16, 2025, 3:30am

One option here is to use a BlockDiagonalMatrix

danielwe · May 16, 2025, 3:45am

Actually, taking a quick look at cuBLAS it looks like an array of pointers to arrays is the expected data structure for batched operations: 1. Introduction — cuBLAS 12.9 documentation. So broadcasting over arrays of arrays may in fact be exactly the right syntax.

That said, I wonder how relevant batching actually is for linsolves. From a quick scroll through the cuSOLVER docs, I don’t see a lot of support for it.

EDIT: There’s the gemvStridedBatched function for strided batches (rand(5, 5, 100) et cetera), and the only batched function for dense matrices in cuSOLVER (syev) also expects strided batches, so the above is probably too optimistic about arrays of arrays and broadcasting. The ideal syntax would have first class support for strided batches.

gdalle · May 16, 2025, 6:08am

Maybe JAX / Reactant.jl are able to optimize this kind of thing?

EDIT: at the moment, it seems Reactant.jl can’t do linear solves

julia> using Reactant

julia> A = Reactant.to_rarray(rand(10, 10));

julia> b = Reactant.to_rarray(rand(10));

julia> f = @compile \(A, b)
ERROR: Scalar indexing is disallowed.
Invocation of getindex(::TracedRArray, ::Union{Int, TracedRNumber{Int}}) resulted in scalar indexing of a GPU array.
This is typically caused by calling an iterating implementation of a method.
Such implementations *do not* execute on the GPU, but very slowly on the CPU,
and therefore should be avoided.

If you want to allow scalar iteration, use `allowscalar` or `@allowscalar`
to enable scalar iteration globally or for the operations in question.
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:35
  [2] errorscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:151
  [3] _assertscalar(op::String, behavior::GPUArraysCore.ScalarIndexing)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:124
  [4] assertscalar(op::String)
    @ GPUArraysCore ~/.julia/packages/GPUArraysCore/aNaXo/src/GPUArraysCore.jl:112
  [5] getindex(a::Reactant.TracedRArray{Float64, 2}, index::Int64)
    @ Reactant.TracedRArrayOverrides ~/.julia/packages/Reactant/cTiTU/src/TracedRArray.jl:114
  [6] getindex
    @ ./subarray.jl:343 [inlined]
  [7] _mapreduce(f::typeof(iszero), op::typeof(&), ::IndexLinear, A::SubArray{…})
    @ Base ./reduce.jl:431
  [8] _mapreduce_dim(f::Function, op::Function, ::Base._InitialValue, A::SubArray{…}, ::Colon)
    @ Base ./reducedim.jl:337
  [9] mapreduce
    @ ./reducedim.jl:329 [inlined]
 [10] overloaded_all
    @ ~/.julia/packages/Reactant/cTiTU/src/TracedRArray.jl:821 [inlined]
 [11] _all(f::Function, x::SubArray{Reactant.TracedRNumber{…}, 1, Reactant.TracedRArray{…}, Tuple{…}, true}, dims::Function)
    @ Reactant ~/.julia/packages/Reactant/cTiTU/src/Overlay.jl:168
 [12] all
    @ ./reducedim.jl:995 [inlined]
 [13] _istril
    @ ~/.julia/juliaup/julia-1.11.5+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/generic.jl:1362 [inlined]
 [14] istril
    @ ~/.julia/juliaup/julia-1.11.5+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/generic.jl:1355 [inlined]
 [15] istril
    @ ~/.julia/juliaup/julia-1.11.5+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/generic.jl:1354 [inlined]
 [16] \
    @ ~/.julia/juliaup/julia-1.11.5+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/generic.jl:1122 [inlined]
 [17] \(none::Reactant.TracedRArray{Float64, 2}, none::Reactant.TracedRArray{Float64, 1})
    @ Reactant ./<missing>:0
 [18] getproperty
    @ ./Base.jl:49 [inlined]
 [19] size
    @ ~/.julia/packages/Reactant/cTiTU/src/TracedRArray.jl:425 [inlined]
 [20] \
    @ ~/.julia/juliaup/julia-1.11.5+0.aarch64.apple.darwin14/share/julia/stdlib/v1.11/LinearAlgebra/src/generic.jl:1120 [inlined]
 [21] call_with_reactant(::typeof(\), ::Reactant.TracedRArray{Float64, 2}, ::Reactant.TracedRArray{Float64, 1})
    @ Reactant ~/.julia/packages/Reactant/cTiTU/src/utils.jl:0
 [22] make_mlir_fn(f::typeof(\), args::Tuple{…}, kwargs::@NamedTuple{}, name::String, concretein::Bool; toscalar::Bool, return_dialect::Symbol, args_in_result::Symbol, construct_function_without_args::Bool, do_transpose::Bool, input_shardings::Nothing, output_shardings::Nothing, runtime::Val{…}, verify_arg_names::Nothing, argprefix::Symbol, resprefix::Symbol, resargprefix::Symbol, num_replicas::Int64, optimize_then_pad::Bool)
    @ Reactant.TracedUtils ~/.julia/packages/Reactant/cTiTU/src/TracedUtils.jl:307
 [23] make_mlir_fn
    @ ~/.julia/packages/Reactant/cTiTU/src/TracedUtils.jl:237 [inlined]
 [24] compile_mlir!(mod::Reactant.MLIR.IR.Module, f::typeof(\), args::Tuple{…}, callcache::Dict{…}, sdycache::Dict{…}; fn_kwargs::@NamedTuple{}, optimize::Bool, cudnn_hlo_optimize::Bool, shardy_passes::Symbol, no_nan::Bool, transpose_propagate::Symbol, reshape_propagate::Symbol, optimize_communications::Bool, assert_nonallocating::Bool, backend::String, raise::Bool, raise_first::Bool, donated_args::Symbol, optimize_then_pad::Bool, runtime::Val{…}, kwargs::@Kwargs{})
    @ Reactant.Compiler ~/.julia/packages/Reactant/cTiTU/src/Compiler.jl:1156
 [25] compile_mlir!
    @ ~/.julia/packages/Reactant/cTiTU/src/Compiler.jl:1110 [inlined]
 [26] compile_xla(f::Function, args::Tuple{…}; client::Nothing, serializable::Bool, kwargs::@Kwargs{…})
    @ Reactant.Compiler ~/.julia/packages/Reactant/cTiTU/src/Compiler.jl:2929
 [27] compile_xla
    @ ~/.julia/packages/Reactant/cTiTU/src/Compiler.jl:2911 [inlined]
 [28] compile(f::Function, args::Tuple{…}; sync::Bool, kwargs::@Kwargs{…})
    @ Reactant.Compiler ~/.julia/packages/Reactant/cTiTU/src/Compiler.jl:2982
 [29] top-level scope
    @ ~/.julia/packages/Reactant/cTiTU/src/Compiler.jl:2135
Some type information was truncated. Use `show(err)` to see complete types.

blegat · May 16, 2025, 6:24am

I assume GitHub - mcabbott/Tullio.jl: ⅀ would address the issues mentioned in the blog post.

tecosaur · May 16, 2025, 8:47am

Finch.jl also comes to mind, like Tullio.jl it has an @einsum macro.

danielwe · May 16, 2025, 4:04pm

Einsum syntax only addresses forward BLAS-type operations. The blog post opens with a gripe about the lack of a sane interface for batched linsolve, which is an inverse, LAPACK-type operation. Granted, this is quite academic because batched linear solvers don’t seem very common (as mentioned above, the only batched dense solver in cuSOLVER is the symmetric/hermitian eigensolver), but it’s what the author seems to want.

Mason · May 16, 2025, 4:18pm

Given the array shape the author uses here, I would probably write this using eachslice if I wanted to avoid loops:

julia> let A = rand(5, 5, 100)
           x = rand(5, 100)
           y = similar(x)

           Ab = eachslice(A, dims=3)
           xb = eachslice(x, dims=2)
           yb = eachslice(y, dims=2)

           yb .= Ab .\ xb
           y
       end
5×100 Matrix{Float64}:
  1.26329    0.846035  …   2.02272
  0.408383  -1.11563       2.82224
  0.180778   0.364129      4.49043
  0.239752   0.345266     -4.85347
 -0.749327   0.263876     -0.945626

danielwe · May 16, 2025, 4:30pm

For completeness, since you’re preallocating, the in-place version would be

ldiv!.(yb, lu!.(Ab), xb)  # explicit factorization required

DNF · May 16, 2025, 4:31pm

Could this not simply be a classic case of “numpy has no way of distinguishing vectors of arrays from higher-dimensional arrays”?You cannot express this in numpy, so it must be 3d.

avikpal · May 16, 2025, 4:34pm

We landed it on EnzymeJax side just 2 days back . I am working on adding it to Reactant feat: factorizations by avik-pal · Pull Request #1234 · EnzymeAD/Reactant.jl · GitHub.

But once the frontend code lands it will support all forms of fancy batching. For GPU and TPU we already use nice batched solvers. On CPU we currently use a loop, but eventually the plan is to lower to MKL that has batched CPU solvers

bertschi · May 17, 2025, 8:52am

Numpy is not bad for what it does and has brought array programming into the main stream while relieving the world from Matlab …
There are other broadcasting models which are more consistent, e.g., as in modern APL or J. Based on the idea of verb rank and the leading axis model this provides a (more?) powerful notation for expressing many array operations concisely. For a proof of concept – based on stack and without further optimizations – you can try my JJ.jl:

julia> using JJ

julia> A = rand(5, 5, 100);  # Note: Switched order to trailing axis model as Julia is column-major

julia> y = rand(5, 100);

julia> rank"2 \ 1"(A, y) |> size
(5, 100)

Here, rank"2 \ 1" indicates that \ now applies to matrices as left argument and vectors as right argument. If a batch is given it simply brodcasts over that. A similar effect can be achieved with jax.vmap and allows to write code for low dimensional input arrays only and add support for batches simply when calling it. An advantage over notations with named indices, e.g., using Einsum like conventions, is that it is easier to generalize and applies equally to multiple batch dimensions:

julia> rank"2 \ 1"(rand(5, 5, 10, 8), rand(5, 10, 8)) |> size
(5, 10, 8)

Imho, the other – made up – example of D_{kn} = \frac{1}{L \cdot M} \sum_{l,m} A_{klm} B_{ln} C_{km} will probably always be easier to read (and write) with named indices as the order of the axis is more or less arbitrary. In practice, a natural order is often available, e.g., distinguishing between batch, time and embedding dimensions, and keeping the order consistent often makes it easier to express, e.g., using rank notation or vmap, to which subset of data each computation applies.
For what its worth, here is a J line for the above using the provided axis order:

(+/%#)@,/"3 A * B */"1/~ C

A more consistent ordering might keep the batch dimensions K, N strictly before or after the “compute” dimensions L,M and vamp a function that is defined for L\times M matrices and L and M vectors only over those.

avikpal · May 21, 2025, 5:32pm

Latest release of reactant supports this:

julia> using Reactant, LinearAlgebra

julia> A = Reactant.to_rarray(rand(5, 5, 32, 128));

julia> y = Reactant.to_rarray(rand(5, 32, 128));

julia> fn(A, b) = lu(A) \ b
fn (generic function with 1 method)

julia> (@jit fn(A, y)) |> size
(5, 32, 128)

gdalle · May 21, 2025, 5:36pm

Is it expected that the vectorization always occurs along the rightmost axes for this kind of matrix function? How can we know in advance?

avikpal · May 21, 2025, 7:46pm

Yes batching is always on the rightmost axes.

Topic		Replies	Views
Function like numpy.stack General Usage question	2	860	May 13, 2020
Dot product of multi-dimensional arrays New to Julia linearalgebra	5	2599	February 11, 2022
Creating multidimensional arrays New to Julia arrays	15	3561	August 12, 2022
Dot product of 2D matrices New to Julia linearalgebra	3	2062	August 10, 2021
Huge thanks to core devs for having such good implimentations of random Linear Algebra functions!hr Numerics	0	1129	May 23, 2020

"I don't like NumPy" - Julia equivalents to the numpy code?

Example 1

Example 2

Example 3

Related topics