ANN: TensorSlice.jl → TensorCast.jl

package
announcement
broadcast

#1

The Tens-Or-Slice™ is a little gadget which aims to make all kinds of slicing, dicing, squeezing and splicing look easy on TV. For example, this is how you slice a 3-tensor into 3×3 SMatrix pieces

@shape B[k]{i,j} == A[i,j,k]  i:3, j:3

And this glues them together again, using reduce(cat,...) as if they were ordinary matrices, and then reshapes & transposes to get an N×3 matrix:

@shape C[(j,k),i] := B[k][i,j]

This macro doesn’t really do any of the work, it just calls standard Julia things, and can be hooked up to StaticArrays.jl,JuliennedArrays.jl, and Strided.jl. The notation is supposed to agree with @tensor and @einsum (they overlap on permutedims) with the addition of == to mean a view rather than := which is a copy.

This was basically a holiday project (once the advent-of-code puzzles got too long) to teach myself a little macrology, which suffered from a little scope creep. But perhaps it will be useful to some of you.

It’s not registered, as there are probably still a few bugs to catch — awkward test cases would be welcome. Installation:

] add https://github.com/mcabbott/TensorSlice.jl

#2

I would like to see something like einops in Julia for tensor manipulations. I opened an issue in Flux.jl: https://github.com/FluxML/Flux.jl/issues/534

You could compare the functionality in your package with the einops project, that would also be very helpful. Thanks for the contribution!


#3

Thanks, I saw this advertised & was trying yesterday to find the link again… but all I could find was a million np.einsum-for-dummies blog posts! Now that I have the link I will take another look at what it does, a bit later.

In Flux there is also this PR for an einsum macro, which like mine, tries to make basic Julia commands, but unlike mine, is interested in * etc. (It’s about 10x more compact! But also gave me some wrong answers and I couldn’t see why.) I haven’t actually tried @shape with Flux but see no reason for it not to work, will check later.

I also hope that TensorOperations will soon understand derivatives. I have an implementation 90% done, see this issue if curious. But roughly speaking, once you tell it how to differentiate a trace and a contraction, then it can do arbitrary (strict Einstein) expressions.


#4

After reading the einops tutorial:

  • Almost everything their rearrange does, can be done with @shape. The exception is what they call 0 axes and I’d call constant indices, which would not be hard to add – it’s is on the readme wishlist. This lets them output arrays with say size(A) = (20, 1, 100), which might be useful for playing well with broadcasting.

  • Many of their reduce examples could be done with @einsum; the simple ones are just sum(A, dims=...) at which Einsum is actually pretty efficient, although not Flux-differentiable. But something like “In [21]: max-pooling” could not. It’s a reshape (in something quite similar to @shape's notation) followed by maximum(A, dims=(3,5)) and dropdims.

That seems to be about all! I don’t see anything more in their tutorial part 2 really. Apparently @shape can claim to implement “shufflenet” whatever that is. At “In [23]” they do “y = convolve_2d(y)” by reshaping before and afterwards. Note that einops doesn’t implement contractions, only reshape/reduce type things. This issue explains that this is because the back-end functions they would like to call are a dogshow.

In fact it will not be hard to add reduction to my package. This needs far more checking… but I just pushed a change which will make this work, for any function which acts like sum(B, dims=...):

B = rand(2,3,5);
@reduce A[i,j] := sum(k) B[i,j,k]

Then I believe that the only thing missing is @shape A[a,1,b,1] := B[a,b], to have all of einops’s functionality. (Making a video like theirs would, on the other hand, take me longer than the entire package…)


#5

This package is now registered under the name TensorCast.jl, because it suffered further scope creep to include arbitrary broadcasting.

You can do things like this, the word lazy means it uses LazyArrays.jl to avoid materialising the whole 3-array RHS:

using Statistics
x = randn(100); m = rand(1:99, 100);

@reduce y[k] := mean(i,j)  m[k] * exp(-(x[i]-x[j])^2)  lazy

And alla xiaodai’s question, here’s an efficient way to stack 60k matrices, broadcast an operation, and re-slice them into 60 batches of 1000:

using Flux
imgs = Flux.Data.MNIST.images(); # 60_000 matrixes each 28x28

@cast batches[b][i,j,n] |= 1 - float(imgs[n\b][i,j])  n:1000;

The stacking operation uses the optimised reduce(hcat,...), plus appropriate reshaping. Using |= instead of := means (by another abuse of notation) that the sliced batches are copied Arrays, not views.