Julia gets mentioned in an article about FORTRAN

certik · May 7, 2021, 8:38pm

Let’s first do just single core performance. Here is how I would write it in Fortran:

program avx
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: t1, t2, r

call cpu_time(t1)
r = f(100000000)
call cpu_time(t2)

print *, "Time", t2-t1
print *, r

contains

    real(dp) function f(N) result(r)
    integer, intent(in) :: N
    integer :: i
    r = 0
    do i = 1, N
        r = r + sin(real(i,dp))
    end do
    end function

end program

Compile and run (gfortran 9.3.0):

$ gfortran -Ofast avx.f90
$ ./a.out
 Time   1.4622860000000000     
   1.7136493465705178

Then compare to pure Julia (1.6.1) first:

function f(N)
    s = 0.0
    for i in 1:N
        s += sin(i)
    end
    s
end

@time r = f(100000000)
println(r)

Compile and run:

$ julia f.jl
  2.784782 seconds (1 allocation: 16 bytes)
1.7136493465703402

So the Fortran code executes 1.9x faster than Julia. I checked the assembly and neither Julia nor gfortran generates AVX instructions for some reason (both are using the xmm* registers). So the comparison should be fair. Why cannot Julia generate AVX instructions by default? I don’t know why gfortran does not.

Also note that the speed of compilation+run for N=10 for the Fortran version is about 0.162s:

$ time (gfortran -Ofast avx.f90 && ./a.out)
 Time   9.9999999999969905E-007
   1.4111883712180107     
( gfortran -Ofast avx.f90 && ./a.out; )  0.08s user 0.04s system 73% cpu 0.162 total

While for Julia it is 0.484s:

$ time julia f.jl
  0.000004 seconds (1 allocation: 16 bytes)
1.4111883712180104
julia f.jl  1.03s user 0.20s system 253% cpu 0.484 total

So Julia is 3x slower to compile. I assume it is the slow startup time or something. But this is the other aspect of tooling and user experience.

Now let’s use the @avxt macro.

using LoopVectorization

function f_avx(N)
    s = 0.0
    @avxt for i in 1:N
        s += sin(i)
    end
    s
end

@time r = f_avx(100000000)
println(r)

Compile and run:

$ julia avx.jl
  0.185562 seconds (1 allocation: 16 bytes)
1.713649346570267

Things got 15x faster than the pure Julia version and about 7.9x faster than the Fortran version.

@Elrod do you know if there is a reason why both Julia and Fortran couldn’t generate the fast AVX version by default? As a user that is what I would want.

My Julia version:

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)

Topic		Replies	Views
Keeping scientific objectivity and details in benchmark reports Community question	56	4791	June 28, 2021
Julia vs Fortran complaint General Usage fortran	25	14612	July 20, 2017
Are these criticisms of Julia true? General Usage	3	5673	June 24, 2022
Future directions of Julia Community	149	13095	December 18, 2020
Julia vs Fortran speed Performance fortran	5	1926	October 13, 2021

Julia gets mentioned in an article about FORTRAN

Related topics