Hey all, I’ve been dabbling with Julia since the 0. days. Up to point I haven’t had any real performance issues, but am now needing to read in some fairly large size files (think finite element / mesh data with millions of nodes/elements).
So I decided to benchmark reading in integer data to an array with Julia 1.7.2 and compare to Fortran, which used GNU Fortran (Homebrew GCC 11.2.0_3) 11.2.0.
I first created an ASCII file with 2,000,000 lines of integer data, each line containing 19 integers separated by spaces. This was done with the following Julia script:
n = 2_000_000
line = "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19\n"
f = open("input.txt", "w")
write(f, "$n\n")
for i = 1:n
write(f, line)
end
close(f)
I tested reading this file with the following script:
using DelimitedFiles
function readdlm_test()
f = open("input.txt", "r")
readline(f) # skip this line => not needed for Julia
A = readdlm(f, Int, comments=false)
SumValues = sum(A)
println("Sum of values = $SumValues")
return A
end
@time A = readdlm_test()
This function takes ~15.43 seconds on first execution and ~15.1 seconds on subsequent calls of @time A=readdlm_test()
To test Fortran reading I used the following code:
PROGRAM fortran_vs_julia
IMPLICIT NONE
! Parameters
INTEGER, PARAMETER :: dbl=SELECTED_REAL_KIND(p=14,r=99)
CHARACTER(9), PARAMETER :: in_file="input.txt"
INTEGER, PARAMETER :: num_rows=19
! Integers
INTEGER :: iost=0, j=0, num_cols=0, sum_values=0
! Reals
REAL(KIND=dbl) :: begin_time, finish_time
! Arrays
INTEGER, ALLOCATABLE, DIMENSION(:,:) :: values
! Store time at start of program
CALL CPU_TIME(begin_time)
! Open and read in_file
OPEN (UNIT=1, FILE=in_file, STATUS='old', ACTION='read', IOSTAT=iost)
! Read line 1 containing number of lines remaining to be read
READ(1,*,IOSTAT=iost) num_cols
! Allocate array to store values
ALLOCATE(values(1:num_rows,1:num_cols))
! Read in rest of file
read_loop: DO j = 1,num_cols
READ(1,*,IOSTAT=iost) values(1:19,j)
END DO read_loop
CLOSE(1)
! Sum the values in the array and print result
sum_values = SUM(values)
WRITE(*,*) "Sum of Values = ", sum_values
! Compute and print total CPU time
CALL CPU_TIME(finish_time)
WRITE(*,*) "CPU Time =", (finish_time - begin_time), " sec"
END PROGRAM
The Fortran program runs in ~5.44 seconds. To my surprise that’s about 3 times faster!!!
So is this just not something Julia is able to compete with Fortran on? Or am I doing this completely the wrong way? If the Julia code above is terrible then I’d suggest that the current documentation on I/O needs some additional pointers for how to fix/improve this.
I’m a huge fan of both languages but could definitely see moving more heavily to Julia if there’s a way to get comparable performance natively inside Julia. So any suggestions are most appreciated.
Thanks.
P.S.
You may notice I included a sum of the values in both codes. This was just to offer assurance that the data was read correctly. Omitting it changes the execution times very little as it’s obviously dominated by the reading.