Performance and memory in large tensor: Julia vs Fortran

I have a tensor with dimensions (I, J, K, L, M, N), which I am modifying through a loop. I am trying to reach Fortran’s speed in Julia (if (I, J, K, L, M, N) = (100, 100, 100, 100, 10, 10) the Julia/Fortran time ratio is 7). In Julia, I am doing:

using Parameters
using BenchmarkTools

function params()
    I, J, K, L, M, N = 100, 100, 100, 100, 10, 10
    A = Array{Float32}(undef, I, J, K, L, M, N)
    return (I=I, J=J, K=K, L=L, M=M, N=N, A=A)
end

function test!(A, params)
    @unpack I, J, K, L, M, N =params
    @inbounds for n in 1:N
        for m in 1:M
            for l in 1:L
                for k in 1:K
                    for j in 1:J
                        for i in 1:I
                            A[i, j, k, l, m, n] = i + j + k + l + m + n
                        end
                    end
                end
            end
        end
    end
end
p = params();
@unpack A = p;
@btime test!(A, p);
  1. In an exercise of this type, is it possible to achieve Fortran’s speed? I did some exercises with heap allocations, StaticArrays, and LoopVectorization, but I couldn’t improve performance. At this stage, I am trying to improve performance without parallelizing.
  2. In Julia, creating the tensor A implies using approximately 40gb of memory. In Fortran, there is no significant increase in memory when creating A nor when modifying its values with the loop (when checking the memory in the task manager): i) is it possible to emulate this in Julia? ii) What are good practices for reducing memory usage when working with tensors this large?

Thanks.

That’s a red flag to me. Are you sure your Fortran function is actually doing anything?

An array of Float32 values of size 100, 100, 100, 100, 10, 10 is literally 40000000000 bytes (4 bytes per Float32). So if your Fortran program isn’t using 40GB of memory, then that suggests the array never existed in the first place.

Maybe your compiler optimized away the creation of the array or something?

10 Likes

You were absolutely right, the Fortran code had a problem. When solving the issue, the doubts I raised in the post were cleared: 1) I achieved Fortran’s performance in Julia, 2) Fortran also had a similar memory usage, so I had misinterpreted the problem.
Thank you for the response.

7 Likes

Happy to help!

1 Like