Bend: a new GPU-native language

Well Julia v1.0 was also the 4th iteration in some sense, going back to Star-P in 2004, Julia initial release, Julia experimental Distributed and Multithreading, then finally Tapir-based parallelism and constructs now going beyond that. So we’re going on 20 years :sweat_smile:. I don’t think there’s much point to such counting, but this ain’t the first rodeo around here.

Julia code can compile down to give custom kernels in CUDA.jl. There’s then abstractions written on top of that with tools like KernelAbstractions.jl

For example, the matmul kernel is effectively just Julia code with an added piece for describing the global index:

@kernel function matmul_kernel!(output, a, b)
    i, j = @index(Global, NTuple)

    # creating a temporary sum variable for matrix multiplication
    tmp_sum = zero(eltype(output))
    for k = 1:size(a)[2]
        tmp_sum += a[i,k] * b[k, j]
    end

    output[i,j] = tmp_sum
end

With Bend if I’m not mistaken you still have to figure out how to represent the code using bend and fold constructs. Note this is pretty close to what I was linking to before with Tapir extensions, where Taka’s proposed Tapir extensions were coupled with transducer-type parallelism approaches.

You can see a demonstration of it here:

https://juliafolds.github.io/data-parallelism/tutorials/quick-introduction/

And it includes a FoldsCUDA.jl for a version with CUDA support. The idea was to integrate the DAG construction into the compiler (that’s the PR) and give a macro so users can opt-in easily (as opposed to being fully parallel, so that the general scheduling problem does not have to be solved).

But there are some issues with the bend/fold approach. It’s not new, and Guy Steele is probably the person you can watch who has had the most on this. His early parallel programming language Fortress is one of the ones that heavily influenced Julia’s designs. He has a nice talk on the limits of such an approach:

And was a keynote at an early JuliaCon:

This is a particular project that I would be interested in reviving if I ever find the right person and grant funding for it (though I’m not the right person to do the day-to-day reviews on it :sweat_smile:). I think integrating bend/fold with opt-in may-happen parallelism (i.e. opt into allowing the compiler to choose parallelism or not, and how) would be a nice thing to have, though I’m personally skeptical of the number of codes I have that it could match my parallelism on so I tend to keep this stuff off “the critical path” for now.

5 Likes