I just decided to migrate from Python+Fortran to Julia as Julia was faster in my test

Mason · May 15, 2021, 5:19pm

Welcome @martin.d.maas! Julia can be quite the rabbit hole. I mostly came for the speed, but at this point I’d probably stay even if it were slow because the language has so many other interesting aspects to it.

A couple of performance notes:

The matrix A doesn’t need to be initialized with zeros, instead you can just grab it as an unitialized chunk of memory by writing A = Matrix{ComplexF64}(undef, N, N). This will save the program from having to manually set everything to zero.
In your loop, all your array accesses are in-bounds by construction, so you can annotate the loop with the @inbounds macro to stop julia from doing any bounds checks (it may already have figured out on it’s own that it can drop these).
As mentioned above by @DNF , cis(...) is more efficient than exp(im * ...)

Hence, I’d write

function eval_exp_tweaked(N)
    a = range(0, stop=2*pi, length=N)
    A = Matrix{ComplexF64}(undef, N, N)
    @inbounds Threads.@threads for i in 1:N
        for j in 1:N
            A[i,j] = cis(100*(a[i]^2 + a[j]^2))
        end
    end
    return A
end

and for benchmarking, I strongly suggest checking out the @btime or @benchmark macros from BenchmarkTools.jl, they’ll take many samples to get statistically significant results as well as omitting compilation time.

Running my tweaked version, I see we can squeeze out quite a bit of extra performance for the smaller sizes:

using BenchmarkTools

function eval_exp(N)
    a = range(0, stop=2*pi, length=N)
    A = zeros(Complex{Float64},N,N)
    Threads.@threads for i in 1:N
        for j in 1:N
            A[i,j] = exp(100*im*(a[i]^2 + a[j]^2))
        end
    end
    return A
end

function eval_exp_tweaked(N)
    a = range(0, stop=2*pi, length=N)
    A = Matrix{ComplexF64}(undef, N, N)
    @inbounds Threads.@threads for i in 1:N
        for j in 1:N
            A[i,j] = cis(100*(a[i]^2 + a[j]^2))
        end
    end
    return A
end

print(string("running loop on ", Threads.nthreads(), " threads \n"))
for N in 1_000:1_000:10_000
    @show N
    A = @btime eval_exp($N)
    B = @btime eval_exp_tweaked($N)
    @assert A ≈ B
    println()
end

running loop on 6 threads 
N = 1000
  5.909 ms (33 allocations: 15.26 MiB)
  3.877 ms (33 allocations: 15.26 MiB)

N = 2000
  45.875 ms (33 allocations: 61.04 MiB)
  23.908 ms (33 allocations: 61.04 MiB)

N = 3000
  96.361 ms (35 allocations: 137.33 MiB)
  55.392 ms (33 allocations: 137.33 MiB)

N = 4000
  173.731 ms (36 allocations: 244.14 MiB)
  92.259 ms (35 allocations: 244.14 MiB)

N = 5000
  259.352 ms (36 allocations: 381.47 MiB)
  157.808 ms (35 allocations: 381.47 MiB)

N = 6000
  409.806 ms (35 allocations: 549.32 MiB)
  258.681 ms (35 allocations: 549.32 MiB)

N = 7000
  536.165 ms (36 allocations: 747.68 MiB)
  363.541 ms (35 allocations: 747.68 MiB)

N = 8000
  985.929 ms (37 allocations: 976.57 MiB)
  836.564 ms (36 allocations: 976.57 MiB)

N = 9000
  1.008 s (38 allocations: 1.21 GiB)
  668.207 ms (37 allocations: 1.21 GiB)

N = 10000
  1.446 s (43 allocations: 1.49 GiB)
  995.583 ms (41 allocations: 1.49 GiB)

Tullio.jl is one of my favourite packages and definitely worth looking into for this stuff though.

Topic		Replies	Views
Keeping scientific objectivity and details in benchmark reports Community question	56	4770	June 28, 2021
Julia 3 times slower than Fortran reading integer data from ASCII file Performance fortran , performance , io	14	1256	March 26, 2022
[ANN] New benchmark comparison General Usage benchmark	5	1037	March 13, 2018
Julia vs Fortran complaint General Usage fortran	25	14607	July 20, 2017
Julia vs Fortran speed Performance fortran	5	1914	October 13, 2021

I just decided to migrate from Python+Fortran to Julia as Julia was faster in my test

Related topics