EDIT: I get the exact same performance between them (and -march=native
doesn’t seem to matter).
Compiling with
gfortran -O3 -march=native rootloop.f90 -shared -fPIC -o libloop.so
gfortran -Ofast -march=native rootloop.f90 -shared -fPIC -o libloopfast.so
Yields:
julia> using BenchmarkTools
julia> function loop(x,n)
s = 0.
for i in 1:n
r = sqrt(i)
s = s + x*r
end
s
end
loop (generic function with 1 method)
julia> function loopfast(x,n)
s = 0.
@fastmath for i in 1:n
r = sqrt(i)
s = s + x*r
end
s
end
loopfast (generic function with 1 method)
julia> floop(x,n) = ccall((:loop_,"libloop.so"),Float64,(Ref{Float64},Ref{Int64}),x,n)
floop (generic function with 1 method)
julia> floopfast(x,n) = ccall((:loop_,"libloopfast.so"),Float64,(Ref{Float64},Ref{Int64}),x,n)
floopfast (generic function with 1 method)
julia> x = rand() ; n = 10_000_000;
julia> @btime loop($x, $n)
13.054 ms (0 allocations: 0 bytes)
7.112252395049944e9
julia> @btime floop($x, $n)
13.054 ms (0 allocations: 0 bytes)
7.112252395049944e9
julia> @btime loopfast($x, $n)
6.982 ms (0 allocations: 0 bytes)
7.112252395049599e9
julia> @btime floopfast($x, $n)
6.982 ms (0 allocations: 0 bytes)
7.1122523950496235e9