I am a new Julia user. When doing matrix operations, there is some confusion about the performance of the code.
Format 1
C=0
A=[2 4 1; 4 1 4; 4 2 3]
@time for i=1:10000000
global A=A/i
A=A+A'
global C=C+sum(A)
end
C
3.792766 seconds (60.00 M allocations: 3.576 GiB, 3.94% gc time)
It takes 3.79s.
Format 2
By referring to documents, the global variables is avoided:
A=[2.0 4 1; 4 1 4; 4 2 3]
function loop_over_global(A::Array{Float64,2})
c=0.0
for i=1:10000000
A=A/i
A=A.+A'
c+=sum(A)
end
return c
end
@time loop_over_global(A)
1.594670 seconds (20.04 M allocations: 2.982 GiB, 10.59% gc time)
It takes 1.59s.
Format 3
By replacing the default matrix operations into loop operations:
A=[2.0 4 1; 4 1 4; 4 2 3]
function loop_over_global(A::Array{Float64,2})
c=0.0
for i=1:10^7
for k=1:3
for j=1:3
A[j,k]=A[j,k]/i
end
end
for k=1:3
for j=k:3
A[j,k]=A[j,k]+A'[j,k]
end
end
for k=1:3
for j=1:k-1
A[j,k]=A[k,j]
end
end
c+=sum(A)
end
return c
end
@time loop_over_global(A)
0.442143 seconds (82.71 k allocations: 4.293 MiB)
It takes 0.44s.
Fortran code
program main
integer(kind=8):: ip,i,j
real(kind=8):: A(3,3),c,t1,t2
c=0.0
A(1,1) = 2; A(1,2) = 4; A(1,3) = 1;
A(2,1) = 4; A(2,2) = 1; A(2,3) = 4;
A(3,1) = 4; A(3,2) = 2; A(3,3) = 3;
call cpu_time(t1)
do ip=1,10000000
A=A/ip
A=A+transpose(A)
do i=1,3
do j=1,3
c=c+A(j,i)
enddo
enddo
enddo
call cpu_time(t2)
write(*,*) c,t2-t1
end
Uing IVF2011. It takes 0.25s.
I am very confused about the performance difference between format 2 and format 3. And can Julia code be further improved?