CuTensorOperations.jl provides support for tensor contractions and related operations with
CuArray objects, i.e. it makes TensorOperations.jl compatible with
For now, it merely dispatches the necessary operations to a new experimental library of NVidia (thanks to the NVidia developers involved) cuTENSOR and does not provide a Julia/CUDAnative implementation. The basic functionality of cuTENSOR is wrapped in the custom/experimental branch
ksh/tensor of CuArrays.jl (thanks to @maleadt and @kslimes).
Given the current state, this is a pre-announcement, and CuTensorOperations.jl is not registered yet. It is more likely to become part of TensorOperations.jl in time.
Check out the README for more info, known (current) limitations and installation instructions. Experiment, and report any issues!
Nicely done! This is a great thing to see added to the ecosystem!
Does it need to be a different macro? It would be better if it was the same macro and just worked via dispatch. That way it could work in generic codes.
Also, do you plan on adding differentiation rule overloads for ChainRules.jl? It would be nice for these operations to be compatible with AD.
The same macro works for
CuArrays. The new macro
@cutensor will take
Arrays of the host, and transfer them to the GPU for you. Let me know if the explanation is not clear on this fact.
Nope, I just missed that. Awesome! Hope to see some good chain rules on this
So do I. There was some work on AD with TensorOperations.jl by other people, e.g. @mcabbott . Not sure if that is still alive, fully functional, integrated somewhere, compatible with
Nice to see, haven’t tried it yet but I will!
Yes, I had a very naiive approach up & running but have not revisited it. Probably easy to package up if there is interest. Was fiddling with a smarter way, but it doesn’t work yet.
I believe @under-Peter is working harder on a new package which includes AD, although parallel to TensorOperations.jl, not built on top.
Partially built on top since when possible I dispatch to
TensorOperations but for the GPU we currently have a custom kernel.
As far as AD is concerned, my project works by naively switching arguments as described here - I’m not sure if there are much better ways, although it’s possible that the optimal contraction order for the backwards-pass is not usually the reverse of the forward pass and one could probably smartly cache results. If anyone has resources for smarter AD for these operations, I’d be interested.
Sorry about getting the details wrong!
I have just made some quick packages for old and new gradient code. These are a bit rough, but perhaps they display some different kinds of naivety. OP pointed me to this paper https://arxiv.org/format/1310.8023 which has matlab code for what sounds like a very similar problem, but I have not read it closely.