Julia gets mentioned in an article about FORTRAN

Let’s first do just single core performance. Here is how I would write it in Fortran:

program avx
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: t1, t2, r

call cpu_time(t1)
r = f(100000000)
call cpu_time(t2)

print *, "Time", t2-t1
print *, r

contains

    real(dp) function f(N) result(r)
    integer, intent(in) :: N
    integer :: i
    r = 0
    do i = 1, N
        r = r + sin(real(i,dp))
    end do
    end function

end program

Compile and run (gfortran 9.3.0):

$ gfortran -Ofast avx.f90
$ ./a.out
 Time   1.4622860000000000     
   1.7136493465705178     

Then compare to pure Julia (1.6.1) first:

function f(N)
    s = 0.0
    for i in 1:N
        s += sin(i)
    end
    s
end

@time r = f(100000000)
println(r)

Compile and run:

$ julia f.jl
  2.784782 seconds (1 allocation: 16 bytes)
1.7136493465703402

So the Fortran code executes 1.9x faster than Julia. I checked the assembly and neither Julia nor gfortran generates AVX instructions for some reason (both are using the xmm* registers). So the comparison should be fair. Why cannot Julia generate AVX instructions by default? I don’t know why gfortran does not.

Also note that the speed of compilation+run for N=10 for the Fortran version is about 0.162s:

$ time (gfortran -Ofast avx.f90 && ./a.out)
 Time   9.9999999999969905E-007
   1.4111883712180107     
( gfortran -Ofast avx.f90 && ./a.out; )  0.08s user 0.04s system 73% cpu 0.162 total

While for Julia it is 0.484s:

$ time julia f.jl
  0.000004 seconds (1 allocation: 16 bytes)
1.4111883712180104
julia f.jl  1.03s user 0.20s system 253% cpu 0.484 total

So Julia is 3x slower to compile. I assume it is the slow startup time or something. But this is the other aspect of tooling and user experience.

Now let’s use the @avxt macro.

using LoopVectorization

function f_avx(N)
    s = 0.0
    @avxt for i in 1:N
        s += sin(i)
    end
    s
end

@time r = f_avx(100000000)
println(r)

Compile and run:

$ julia avx.jl
  0.185562 seconds (1 allocation: 16 bytes)
1.713649346570267

Things got 15x faster than the pure Julia version and about 7.9x faster than the Fortran version.

@Elrod do you know if there is a reason why both Julia and Fortran couldn’t generate the fast AVX version by default? As a user that is what I would want.

My Julia version:

julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin18.7.0)
  CPU: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
10 Likes