Performance and memory in large tensor: Julia vs Fortran

fordonez · December 6, 2023, 1:59pm

I have a tensor with dimensions (I, J, K, L, M, N), which I am modifying through a loop. I am trying to reach Fortran’s speed in Julia (if (I, J, K, L, M, N) = (100, 100, 100, 100, 10, 10) the Julia/Fortran time ratio is 7). In Julia, I am doing:

using Parameters
using BenchmarkTools

function params()
    I, J, K, L, M, N = 100, 100, 100, 100, 10, 10
    A = Array{Float32}(undef, I, J, K, L, M, N)
    return (I=I, J=J, K=K, L=L, M=M, N=N, A=A)
end

function test!(A, params)
    @unpack I, J, K, L, M, N =params
    @inbounds for n in 1:N
        for m in 1:M
            for l in 1:L
                for k in 1:K
                    for j in 1:J
                        for i in 1:I
                            A[i, j, k, l, m, n] = i + j + k + l + m + n
                        end
                    end
                end
            end
        end
    end
end
p = params();
@unpack A = p;
@btime test!(A, p);

In an exercise of this type, is it possible to achieve Fortran’s speed? I did some exercises with heap allocations, StaticArrays, and LoopVectorization, but I couldn’t improve performance. At this stage, I am trying to improve performance without parallelizing.
In Julia, creating the tensor A implies using approximately 40gb of memory. In Fortran, there is no significant increase in memory when creating A nor when modifying its values with the loop (when checking the memory in the task manager): i) is it possible to emulate this in Julia? ii) What are good practices for reducing memory usage when working with tensors this large?

Thanks.

Mason · December 6, 2023, 2:08pm

That’s a red flag to me. Are you sure your Fortran function is actually doing anything?

An array of Float32 values of size 100, 100, 100, 100, 10, 10 is literally 40000000000 bytes (4 bytes per Float32). So if your Fortran program isn’t using 40GB of memory, then that suggests the array never existed in the first place.

Maybe your compiler optimized away the creation of the array or something?

fordonez · December 6, 2023, 3:29pm

You were absolutely right, the Fortran code had a problem. When solving the issue, the doubts I raised in the post were cleared: 1) I achieved Fortran’s performance in Julia, 2) Fortran also had a similar memory usage, so I had misinterpreted the problem.
Thank you for the response.

Mason · December 6, 2023, 3:35pm

Happy to help!

Topic		Replies	Views
Fortran calling Julia. Julia 10x slower than Fortran Performance question , fortran	15	1985	May 27, 2021
Julia code becomes slower on running on supercomputers and does not scale well when parallelizing with Base.Threads Julia at Scale fortran , parallel , linearalgebra , threads	73	2031	January 22, 2024
How to utilize symmetry in tensor contraction? Numerics	2	253	August 10, 2023
Poor performance due to memory allocations? Performance memory-allocation	17	2852	January 15, 2019
Fortran vs Julia stack allocated arrays Performance array , memory-allocation	18	6522	November 29, 2018

Performance and memory in large tensor: Julia vs Fortran

Related topics