Let’s first do just single core performance. Here is how I would write it in Fortran:
program avx
implicit none
integer, parameter :: dp = kind(0.d0)
real(dp) :: t1, t2, r
call cpu_time(t1)
r = f(100000000)
call cpu_time(t2)
print *, "Time", t2-t1
print *, r
contains
real(dp) function f(N) result(r)
integer, intent(in) :: N
integer :: i
r = 0
do i = 1, N
r = r + sin(real(i,dp))
end do
end function
end program
Compile and run (gfortran 9.3.0):
$ gfortran -Ofast avx.f90
$ ./a.out
Time 1.4622860000000000
1.7136493465705178
Then compare to pure Julia (1.6.1) first:
function f(N)
s = 0.0
for i in 1:N
s += sin(i)
end
s
end
@time r = f(100000000)
println(r)
Compile and run:
$ julia f.jl
2.784782 seconds (1 allocation: 16 bytes)
1.7136493465703402
So the Fortran code executes 1.9x faster than Julia. I checked the assembly and neither Julia nor gfortran generates AVX instructions for some reason (both are using the xmm*
registers). So the comparison should be fair. Why cannot Julia generate AVX instructions by default? I don’t know why gfortran does not.
Also note that the speed of compilation+run for N=10 for the Fortran version is about 0.162s:
$ time (gfortran -Ofast avx.f90 && ./a.out)
Time 9.9999999999969905E-007
1.4111883712180107
( gfortran -Ofast avx.f90 && ./a.out; ) 0.08s user 0.04s system 73% cpu 0.162 total
While for Julia it is 0.484s:
$ time julia f.jl
0.000004 seconds (1 allocation: 16 bytes)
1.4111883712180104
julia f.jl 1.03s user 0.20s system 253% cpu 0.484 total
So Julia is 3x slower to compile. I assume it is the slow startup time or something. But this is the other aspect of tooling and user experience.
Now let’s use the @avxt
macro.
using LoopVectorization
function f_avx(N)
s = 0.0
@avxt for i in 1:N
s += sin(i)
end
s
end
@time r = f_avx(100000000)
println(r)
Compile and run:
$ julia avx.jl
0.185562 seconds (1 allocation: 16 bytes)
1.713649346570267
Things got 15x faster than the pure Julia version and about 7.9x faster than the Fortran version.
@Elrod do you know if there is a reason why both Julia and Fortran couldn’t generate the fast AVX version by default? As a user that is what I would want.
My Julia version:
julia> versioninfo()
Julia Version 1.6.1
Commit 6aaedecc44 (2021-04-23 05:59 UTC)
Platform Info:
OS: macOS (x86_64-apple-darwin18.7.0)
CPU: Intel(R) Core(TM) i9-9980HK CPU @ 2.40GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-11.0.1 (ORCJIT, skylake)