Here is one example (not sure how representative it is):
Fortran:
Code
PROGRAM Parallel_Hello_World
USE OMP_LIB
REAL*8 :: partial_Sum, total_Sum
!$OMP PARALLEL PRIVATE(partial_Sum) SHARED(total_Sum)
partial_Sum = 0.d0
total_Sum = 0.d0
!$OMP DO
DO i=1,1000000000
partial_Sum = partial_Sum + dsin(dble(i))
END DO
!$OMP END DO
!$OMP CRITICAL
total_Sum = total_Sum + partial_Sum
!$OMP END CRITICAL
!$OMP END PARALLEL
PRINT *, "Total Sum: ", total_Sum
END
Result:
leandro@pitico:~/Drive/Work/JuliaPlay% gfortran -O3 omp.f95 -o omp
leandro@pitico:~/Drive/Work/JuliaPlay% time ./omp
Total Sum: 0.42129448674541342
real 0m54,495s
user 0m54,476s
sys 0m0,004s
leandro@pitico:~/Drive/Work/JuliaPlay% gfortran -O3 -fopenmp omp.f95 -o omp
leandro@pitico:~/Drive/Work/JuliaPlay% time ./omp
Total Sum: 0.42129448674645509
real 0m12,538s
user 1m29,020s
sys 0m0,016s
Julia using FLoops
:
Code
using FLoops
function f()
@floop for i in 1:1000000000
@reduce(total_Sum += sin(i))
end
total_Sum
end
println("Total sum: ",f())
Result:
leandro@pitico:~/Drive/Work/JuliaPlay% time julia floop.jl
Total sum: 0.4212944867466973
real 0m40,729s
user 0m41,039s
sys 0m0,608s
leandro@pitico:~/Drive/Work/JuliaPlay% time julia -t8 floop.jl
Total sum: 0.4212944867465145
real 0m14,743s
user 1m4,074s
sys 0m0,751s
Julia using @threads
:
using Base.Threads
function f()
total_Sum = zeros(nthreads())
@threads for i in 1:1000000000
total_Sum[threadid()] += sin(i)
end
sum(total_Sum)
end
println("Total sum: ",f())
leandro@pitico:~/Drive/Work/JuliaPlay% time julia -t8 threads.jl
Total sum: 0.4212944867465146
real 0m10,514s
user 1m11,253s
sys 0m0,544s
Note that Julia is faster than Fortran with a single thread (no idea why).
I split this in a new thread, I guess I don’t really understand what is going on when using atomic operations, and now removed the initial test, where I used atomic_add!
here because it didn’t make sense.