NDReducibles.jl: multi-dimensional array computations without indexing (a proof-of-concept)

tkf · July 27, 2019, 8:02pm

(This is kind of a follow-up discussion of my JuliaCon talk. So the context probably is not super clear if you haven’t watched it.)

Writing Transducers.jl (see also the latest release announcement) made me think that fold is a great approach for writing a large class of loops. So, it naturally made me want to try stretching the boundary of fold-based computation. That’s why I tried to tackle Julep: solving tricky iteration problems. You can read the Julep in GitHub (I also posted a different version of this post there) but a short summary is: if/how we can support multi-dimensional iteration like the one in matrix-matrix multiplication in the iterator API. As you have multiple arrays with different set of indices, this is a hairy problem to solve. You have to find an index that corresponds to the fastest axis for all arrays with that index. Once you find the best or good enough order of indices to iterate, another problem is how to execute such loops.

Actually, it’s even more challenging for fold because these kinds of problem often require mutating multi-dimensional arrays. Fold is really good at reduction, like summing things up. But how can we use it for mutation? It turned out there is a simple trick we can use. The idea is to create an array of references to the elements in the original array to be mutated.

julia> ys = zeros(3);

julia> ys_ref = [Ref(ys, i) for i in eachindex(ys)];

julia> y1 = ys_ref[1];
       y1[] = 123
123

julia> ys
3-element Array{Float64,1}:
 123.0
   0.0
   0.0

julia> foreach(zip(ys_ref, ones(3))) do (y, x)
           y[] = x
       end

Of course, above code allocates a new array ys_ref and it’s very inefficient to do so. But it can be fixed by a very simple lazy array type. So I created a package called Referenceables.jl to do it. (You can install it by pkg> add Referenceables.)

Using this trick, we “only” need a custom fold that runs over the indices. It is not so difficult to automatically schedule the loop in such a way that the inner-most loop uses the fastest axis of the all arrays involved in the loop (if such choice exists). Putting these things together, I came up with the following API and implemented in NDReducibles.jl:

using NDReducibles: ndreducible
using Referenceables: referenceable

foreach(
    ndreducible(
        referenceable(C) => (:i, :j),  # mark that we want to mutate C
        A => (:i, :k),
        B => (:k, :j),
    ),
) do (c, a, b)
    c[] += a * b
end

foreach is defined in terms of foldl [1] so above code internally calls foldl for ndreducible. This foldl does the aforementioned loop scheduling.

I find this API very interesting (even though I wrote it). There is so little foreah users can do: specifying the indices and loop body. Everything else, including even indexing (getindex), is automatically handled (hence the title). It gives a huge opportunity for fold implementers. They can implement various optimizations. Re-ordering the loop for improving locality is a pretty simple one. But it is also in principle possible to do automatic parallelization, cache oblivious access pattern, etc.

NDReducibles.jl is not registered yet, as at this point I don’t think it has a big advantage over already established packages like TensorOperations.jl. To make it more practical, I think something like BroadcastStyle is required for flexibly negotiating indexing order when, e.g., some non-dense arrays are involved.

[1] foldl is formally defined as left-to-right application of a binary function so it is not very clean to use it for shuffling loop order. But since Transducers.jl’s machinery is build around foldl, it’s very handy to use it here. For more serious implementations, it is probably a better idea to have a separate “commutative fold” function.

chakravala · July 27, 2019, 11:06pm

What’s the advantages of using Referencables versus @view, for example?

tkf · July 27, 2019, 11:18pm

referenceable(x) is not view(x, :, ..., :) because the element type of referenceable(x) is <: Ref{eltype(x)} while the element type of view(x, :, ..., :) is eltype(x). You could use a view instead of Ref but I want to make it as simple as possible and also keep inbounds information in the type parameter.

chakravala · July 27, 2019, 11:53pm

Please elaborate, that sounds very interesting but I still don’t completely understand why it’s better for you.

tkf · July 28, 2019, 1:10am

referenceable is nice because I can combine it with, say, foreach and zip to implement copy!:

ys = zeros(3)
foreach(zip(referenceable(ys), ones(3))) do (y, x)
    y[] = x
end

I can’t use the same trick with view:

ys = zeros(3)
foreach(zip(view(ys), ones(3))) do (y, x)
    y :: Float64
    x :: Float64
    # There's nothing I can do here.
end

Note that zip and foreach do not have to know which array is mutable or not. They can just focus on the shape of the array and how to iterate over the items.

Topic		Replies	Views
How to write nested reducibles in Transducers.jl General Usage transducers	0	261	July 13, 2022
Threaded map and reduce problem General Usage parallel	5	3302	February 25, 2021
ANN: Parallel `for` loops in FLoops.jl with composable and extensible fold-based API Package Announcements parallel , multithreading , distributed , loops	17	3425	August 25, 2020
Parallel reductions Julia at Scale	23	5172	June 19, 2022
Array containing references? General Usage	7	878	October 20, 2019

NDReducibles.jl: multi-dimensional array computations without indexing (a proof-of-concept)

Related topics