Hey all, I’ve been dabbling with Julia since the 0. days. Up to point I haven’t had any real performance issues, but am now needing to read in some fairly large size files (think finite element / mesh data with millions of nodes/elements).
So I decided to benchmark reading in integer data to an array with Julia 1.7.2 and compare to Fortran, which used GNU Fortran (Homebrew GCC 11.2.0_3) 11.2.0.
I first created an ASCII file with 2,000,000 lines of integer data, each line containing 19 integers separated by spaces. This was done with the following Julia script:
n = 2_000_000 line = "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19\n" f = open("input.txt", "w") write(f, "$n\n") for i = 1:n write(f, line) end close(f)
I tested reading this file with the following script:
using DelimitedFiles function readdlm_test() f = open("input.txt", "r") readline(f) # skip this line => not needed for Julia A = readdlm(f, Int, comments=false) SumValues = sum(A) println("Sum of values = $SumValues") return A end @time A = readdlm_test()
This function takes ~15.43 seconds on first execution and ~15.1 seconds on subsequent calls of
To test Fortran reading I used the following code:
PROGRAM fortran_vs_julia IMPLICIT NONE ! Parameters INTEGER, PARAMETER :: dbl=SELECTED_REAL_KIND(p=14,r=99) CHARACTER(9), PARAMETER :: in_file="input.txt" INTEGER, PARAMETER :: num_rows=19 ! Integers INTEGER :: iost=0, j=0, num_cols=0, sum_values=0 ! Reals REAL(KIND=dbl) :: begin_time, finish_time ! Arrays INTEGER, ALLOCATABLE, DIMENSION(:,:) :: values ! Store time at start of program CALL CPU_TIME(begin_time) ! Open and read in_file OPEN (UNIT=1, FILE=in_file, STATUS='old', ACTION='read', IOSTAT=iost) ! Read line 1 containing number of lines remaining to be read READ(1,*,IOSTAT=iost) num_cols ! Allocate array to store values ALLOCATE(values(1:num_rows,1:num_cols)) ! Read in rest of file read_loop: DO j = 1,num_cols READ(1,*,IOSTAT=iost) values(1:19,j) END DO read_loop CLOSE(1) ! Sum the values in the array and print result sum_values = SUM(values) WRITE(*,*) "Sum of Values = ", sum_values ! Compute and print total CPU time CALL CPU_TIME(finish_time) WRITE(*,*) "CPU Time =", (finish_time - begin_time), " sec" END PROGRAM
The Fortran program runs in ~5.44 seconds. To my surprise that’s about 3 times faster!!!
So is this just not something Julia is able to compete with Fortran on? Or am I doing this completely the wrong way? If the Julia code above is terrible then I’d suggest that the current documentation on I/O needs some additional pointers for how to fix/improve this.
I’m a huge fan of both languages but could definitely see moving more heavily to Julia if there’s a way to get comparable performance natively inside Julia. So any suggestions are most appreciated.
You may notice I included a sum of the values in both codes. This was just to offer assurance that the data was read correctly. Omitting it changes the execution times very little as it’s obviously dominated by the reading.