ANN: JJ.jl -- library for J-like verb rank in Julia

bertschi · June 26, 2022, 2:17pm

Recently, I started to learn some APL and J. In particular, I really liked the idea of verb rank. This package JJ.jl attempts to enable something similar in Julia.

In a nutshell, verb rank is related to the split-apply-combine strategy and allows to apply a function (called verb in APL/J) to sub-arrays by specifying the desired number of dimensions the function should act on. As an example, consider a 2 3 4 sized array

arr = reshape(1:24, 2, 3, 4)

and a function providing some information about a value:

info(x) = "Some scalar $x"
info(x::AbstractArray) = "Array of shape $(size(x))"

Then, we can obtain information about each (scalar) element of an
array via broadcasting, i.e.,

julia> info.(arr)
2×3×4 Array{String, 3}:
[:, :, 1] =
 "Some scalar 1"  "Some scalar 3"  "Some scalar 5"
 "Some scalar 2"  "Some scalar 4"  "Some scalar 6"

[:, :, 2] =
 "Some scalar 7"  "Some scalar 9"   "Some scalar 11"
 "Some scalar 8"  "Some scalar 10"  "Some scalar 12"

[:, :, 3] =
 "Some scalar 13"  "Some scalar 15"  "Some scalar 17"
 "Some scalar 14"  "Some scalar 16"  "Some scalar 18"

[:, :, 4] =
 "Some scalar 19"  "Some scalar 21"  "Some scalar 23"
 "Some scalar 20"  "Some scalar 22"  "Some scalar 24"

Alternatively, the info function can be “ranked” and applied to sub-arrays in a systematic fashion:

rank"info 0"(arr) is the same as info.(arr) and applies the function to each scalar element of rank 0.
rank"info 2"(arr) now applies the function to all sub-arrays of rank 2, i.e., with ndims = 2 in Julia

julia> rank"info 2"(arr)
4-element Vector{String}:
 "Array of shape (2, 3)"
 "Array of shape (2, 3)"
 "Array of shape (2, 3)"
 "Array of shape (2, 3)"

Using this technique, many algorithms on arrays can be expressed quite easily. As a teaser, batched matrix multiplication is straight-forward with this approach

using Flux

A = randn(2, 3, 5)  # a batch of 2x3 matrices
B = randn(3, 4, 5)  # another batch

A ⊠ B  # special batched matrix multiplication

rank"2 * 2"(A, B)  # just rank the standard one!

and as slightly bigger examples, I have implemented K-Means clustering and a very basic Transformer layer.

The library tries to be fast (building on JuliennedArrays.jl), but has some overhead and also the syntax could still be improved. Hope, you still find it interesting or useful …

robsmith11 · June 26, 2022, 8:24pm

Cool idea! I’ve only dabbled in J, but used kdb+ for many years and often miss the ease with which one could express complex array operations.

haberdashPI · June 26, 2022, 8:38pm

This is an interesting idea. Is there a strong reason for this to be a macro? Why not spell it rank(*, 2, 2, A, B)? (Also not convinced by the name rank)

bertschi · June 26, 2022, 9:21pm

No strong reason and there is indeed a function version, i.e., ranked(*, Val(2), Val(2))(A, B) would also work. Maybe a little less nice due to the Val types.
The reason that rank"2 * 2" constructs a callable struct is that the higher-order functions insert and table can dispatch on that to respect the rank of their argument as well:

A = reshape(1:6, 2, 3)
B = reshape(1:12, 3, 4)
dot(x, y) = sum(x .* y)
# Now, construct a table, i.e., outer product, taking the dot product between each row of A and column of B
table(rank"1 dot 1", A', B)  # here table needs to know the rank of its argument function

Currently, all names are based on the ones from J, i.e., rank, insert and table. Don’t have a strong preference for these either. How would you call rank?

aplavin · June 26, 2022, 10:31pm

The examples in the first post can easily be done with existing packages, and (imho) in a cleaner way than specialized macros.

julia> using SplitApplyCombine

julia> info.(splitdims(arr, 3))
4-element Vector{String}:
 "Array of shape (2, 3)"
 "Array of shape (2, 3)"
 "Array of shape (2, 3)"
 "Array of shape (2, 3)"

julia> splitdims(A, 3) .* splitdims(B, 3)
5-element Vector{Matrix{Float64}}:
...

I never used J or APL, but seems like Julia already has all necessary facilities to make such operations easy.
It also follows the principle of orthogonal design: splitting arrays by dimension(s) is separated from applying a function.

The SplitApplyCombine solution also makes it easy to apply a function by another dimension, or multiple dims:

julia> info.(splitdims(arr, 3))
4-element Vector{String}:
 "Array of shape (2, 3)"
 "Array of shape (2, 3)"
 "Array of shape (2, 3)"
 "Array of shape (2, 3)"

julia> info.(splitdims(arr, 2))
3-element Vector{String}:
 "Array of shape (2, 4)"
 "Array of shape (2, 4)"
 "Array of shape (2, 4)"

julia> info.(splitdims(arr, (1, 3)))
2×4 Matrix{String}:
 "Array of shape (3,)"  "Array of shape (3,)"  "Array of shape (3,)"  "Array of shape (3,)"
 "Array of shape (3,)"  "Array of shape (3,)"  "Array of shape (3,)"  "Array of shape (3,)"

bertschi · June 26, 2022, 11:00pm

Right, SplitApplyCombine as well as JuliennedArrays provide similar functionality and my implementation is actually pretty much as you suggest (using JuliennedArrays). The analog for the batched multiplication would be combinedims(splitdims(A, 3) .* splitdims(B, 3)) though.

Overall, I view these libraries as slightly lower level (and more general for that matter). The verb rank consistently always splits of the starting dimensions of an array. To me this consistency appears advantageous in larger examples and things get more interesting with functions of two arguments, i.e., using different left and right ranks, and higher-order functions respecting these ranks as well (currently only insert and table are implemented here). Also combining the results automatically again, makes successive operations on arrays more composable. But feel free to disagree, if you prefer a more explicit approach.

mschauer · June 27, 2022, 5:28am

but there are things which are tricky in Julia, for example the treating “arrays of arrays as arrays” in broadcast isn’t straight forward and ranked verbs can exactly help with that.

bertschi · June 27, 2022, 9:13pm

Not sure, if/how it would help with nested arrays. Instead, it allows non-nested arrays to be treated as if they were nested on an as needed basis. Results are then recombined into a non-nested array, such that you never need to work with nested arrays explicitly.

mcabbott · June 27, 2022, 10:21pm

One reason that some combined operations are their own functions is that these can be more efficient than making slices. You can see this with your batched_mul example, where the fused on saves allocations, and (IIRC) will also parallelise better on larger cases, and call special CUDA routines.

julia> using NNlib, BenchmarkTools

julia> C = @btime $A ⊠ $B;  # special batched matrix multiplication
  309.363 ns (1 allocation: 400 bytes)  # fused, 1 allocation of final Array

julia> using JJ

julia> C ≈ @btime rank"2 * 2"($A, $B)  # just rank the standard one!
  803.831 ns (7 allocations: 768 bytes)  # allocates slices, returns lazy JuliennedArrays.Align
true

Or a simpler example:

julia> M = transpose(rand(10^3, 10^3));

julia> V = @btime vec(sum($M, dims=1));
  158.417 μs (3 allocations: 8.02 KiB)

julia> V ≈ @btime rank"sum 1"($M)  # this is less cache-friendly than sum
  908.500 μs (1 allocation: 7.94 KiB)
true

You might like this earlier vmap discussion about trying to make such transformations automatically.

Such concerns aside, some other ways to handle slices besides SplitApplyCombine & JuliennedArrays (mentioned above) include these. They are certainly more verbose than rank:

julia> C ≈ @btime stack(*, eachslice($A, dims=3), eachslice($B, dims=3))  # should work with PR43334
  872.121 ns (14 allocations: 1.27 KiB)  # allocates slices, returns Array
true

julia> using TensorCast  # (my package)

julia> C ≈ @cast C2[i,j,n] := (A[:,:,n] * B[:,:,n])[i,j]
true

bertschi · June 28, 2022, 7:09pm

Many thanks for this detailed analysis. I had seen the vmap discussion before and it appears that many optimizations require a more global analysis of the computations, i.e., when to push down loops into sub-functions and when to copy data in order to enable cache-friendly processing. Especially your second example is very interesting, as I see no difference when M is not transposed. Indeed, there is a special method on Base.mapreducedim! which seems to be hit be sum – as far as I can tell from reading the code (is there any tool that would show the actually executed methods?).
Overall, I guess rankis most useful if you have a natural and consistent ordering of your dimensions and build up functions following that order, e.g., as in a transformer with dimensions embedding x seq_position x head x batch and operations acting on all batches, within all heads or at each sequence position. When more elaborate reshapings are required other options are probably more useful. I will certainly have a look at TensorCast (had found Tullio, but not seen yours before).

Topic		Replies	Views
Can you broadcast functions over arbitrary rank? (like J/APL) General Usage question	5	927	August 26, 2019
When appllying roll rank function, how much faster Julia can be compared to Python? New to Julia	19	630	March 18, 2022
"I don't like NumPy" - Julia equivalents to the numpy code? General Usage gpu , gpuarrays , linearalgebra , linearsolve	19	1328	May 21, 2025
Cumbersome array reshaping for broadcasting, unlike numpy New to Julia	24	5452	August 9, 2024
Broadcast product of arrays New to Julia	19	1215	March 4, 2019

ANN: JJ.jl -- library for J-like verb rank in Julia

Related topics