Tensor regression models in Julia


#1

I’m trying to create a tensor regression model in Julia. I first tried in PyTorch but it was becoming a hassle so thought I’d give Julia a try, and its looking like Julia can’t do it at all (at least in any efficient way).

A tensor regression is just a series of tensor contractions (“generalized matrix multiplications”), where the tensors (multi-dimensional arrays) act as trainable parameters. I want to train these parameter tensors using gradient descent as I would with a normal neural network (which is a series of matrix-matrix multiplications with interspersed non-linear functions). The problem is TensorOperations.jl, the major library that supports tensor contractions, does not work with either of Julia’s main ML libraries, Knet.jl and Flux.jl. Both of the latter libraries secretly convert your Array types into special trackable types so they can keep track of gradients, but then TensorOperations.jl can’t handle those types.

I tried Einsum.jl and it technically works with Flux.jl, but only with outrageously bad performance. It took literally 2 minutes to do a very small tensor contraction with Flux tracked arrays (takes 1.3 seconds with normal arrays).

I’m picking up Julia again after trying it back at v0.3 so I may just be missing something here. Any help appreciated.


#2

How complicated are the contractions you need? If you can re-write them with reshape and permutedims then Flux (and probably other options) should work well. Here’s A_{i,j,k} B_k = C_{ij}:

using Flux
A = param(rand(2,2,7))
B = param(rand(7))
reshape(A, 4,7) * B  |>  C->reshape(C, 2,2)

#3

After reading the readme: TensorOperations seems to reduce everything to three functions add!, trace!, contract!.

It seems like it would not be very hard to write fallback versions of these functions in terms of reshape etc. as above. Or in fact to just work out a gradient for each of these, and then provide this to Flux.back.


#4

I’m trying to do this contraction:

C_{s,k} = X_{s,h}A_{h,i,j}B_{i,j,k}

where X is some input tensor of data and the tensors A and B are the trainable parameter tensors. In particular I’m trying to do a tensor regression with MNIST data, so X is [batch size, 784] and the result is [batch size, 10] for each of digit classes.

I’ll try what you suggested.


#5

I agree with @improbable22 that these gradients should definitely be added as a method to Flux.back.


#6

Slightly off topic but it reminds me that I wonder if the work related in this article could be efficiently developed in Julia…