I have a tensor with dimensions (I, J, K, L, M, N), which I am modifying through a loop. I am trying to reach Fortran’s speed in Julia (if (I, J, K, L, M, N) = (100, 100, 100, 100, 10, 10) the Julia/Fortran time ratio is 7). In Julia, I am doing:

```
using Parameters
using BenchmarkTools
function params()
I, J, K, L, M, N = 100, 100, 100, 100, 10, 10
A = Array{Float32}(undef, I, J, K, L, M, N)
return (I=I, J=J, K=K, L=L, M=M, N=N, A=A)
end
function test!(A, params)
@unpack I, J, K, L, M, N =params
@inbounds for n in 1:N
for m in 1:M
for l in 1:L
for k in 1:K
for j in 1:J
for i in 1:I
A[i, j, k, l, m, n] = i + j + k + l + m + n
end
end
end
end
end
end
end
p = params();
@unpack A = p;
@btime test!(A, p);
```

- In an exercise of this type, is it possible to achieve Fortran’s speed? I did some exercises with heap allocations, StaticArrays, and LoopVectorization, but I couldn’t improve performance. At this stage, I am trying to improve performance without parallelizing.
- In Julia, creating the tensor A implies using approximately 40gb of memory. In Fortran, there is no significant increase in memory when creating A nor when modifying its values with the loop (when checking the memory in the task manager): i) is it possible to emulate this in Julia? ii) What are good practices for reducing memory usage when working with tensors this large?

Thanks.