[pre-ANN]: CuTensorOperations.jl

CuTensorOperations.jl provides support for tensor contractions and related operations with CuArray objects, i.e. it makes TensorOperations.jl compatible with CuArray objects.

For now, it merely dispatches the necessary operations to a new experimental library of NVidia (thanks to the NVidia developers involved) cuTENSOR and does not provide a Julia/CUDAnative implementation. The basic functionality of cuTENSOR is wrapped in the custom/experimental branch ksh/tensor of CuArrays.jl (thanks to @maleadt and @kslimes).

Given the current state, this is a pre-announcement, and CuTensorOperations.jl is not registered yet. It is more likely to become part of TensorOperations.jl in time.

Check out the README for more info, known (current) limitations and installation instructions. Experiment, and report any issues!

11 Likes

Nicely done! This is a great thing to see added to the ecosystem!

Does it need to be a different macro? It would be better if it was the same macro and just worked via dispatch. That way it could work in generic codes.

Also, do you plan on adding differentiation rule overloads for ChainRules.jl? It would be nice for these operations to be compatible with AD.

The same macro works for CuArrays. The new macro @cutensor will take Arrays of the host, and transfer them to the GPU for you. Let me know if the explanation is not clear on this fact.

1 Like

Nope, I just missed that. Awesome! Hope to see some good chain rules on this :slight_smile:

So do I. There was some work on AD with TensorOperations.jl by other people, e.g. @mcabbott . Not sure if that is still alive, fully functional, integrated somewhere, compatible with ChainRules.jl ?

Nice to see, haven’t tried it yet but I will!

Yes, I had a very naiive approach up & running but have not revisited it. Probably easy to package up if there is interest. Was fiddling with a smarter way, but it doesn’t work yet.

I believe @under-Peter is working harder on a new package which includes AD, although parallel to TensorOperations.jl, not built on top.

Partially built on top since when possible I dispatch to TensorOperations but for the GPU we currently have a custom kernel.

As far as AD is concerned, my project works by naively switching arguments as described here - I’m not sure if there are much better ways, although it’s possible that the optimal contraction order for the backwards-pass is not usually the reverse of the forward pass and one could probably smartly cache results. If anyone has resources for smarter AD for these operations, I’d be interested.

Sorry about getting the details wrong!

I have just made some quick packages for old and new gradient code. These are a bit rough, but perhaps they display some different kinds of naivety. OP pointed me to this paper https://arxiv.org/format/1310.8023 which has matlab code for what sounds like a very similar problem, but I have not read it closely.