CUDA on slices

Thanks Jaratur.
I tried a few einsum packages. When they do work they tend to be slower than mapslices on a CPU.

I’ve reposted here Mapslices very slow

I’d rather not write a custom Kernel. See if there’s any alternatives.

1 Like