If I get your code correctly, your d=6 is small enough to potentially see benefits from StaticArrays. TensorOperations as mentioned above is also worth pursuing, as you tight loop is currently allocating tons of memory (every slice x[:,:,i] is allocating a matrix).