Julia 3 times slower than Fortran reading integer data from ASCII file

Hey all, I’ve been dabbling with Julia since the 0. days. Up to point I haven’t had any real performance issues, but am now needing to read in some fairly large size files (think finite element / mesh data with millions of nodes/elements).

So I decided to benchmark reading in integer data to an array with Julia 1.7.2 and compare to Fortran, which used GNU Fortran (Homebrew GCC 11.2.0_3) 11.2.0.

I first created an ASCII file with 2,000,000 lines of integer data, each line containing 19 integers separated by spaces. This was done with the following Julia script:

n = 2_000_000
line = "1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19\n"

f = open("input.txt", "w")
write(f, "$n\n")
for i = 1:n
  write(f, line)
end
close(f)

I tested reading this file with the following script:

using DelimitedFiles

function readdlm_test()
  f = open("input.txt", "r")
  readline(f) # skip this line => not needed for Julia
  A = readdlm(f, Int, comments=false)
  SumValues = sum(A)
  println("Sum of values = $SumValues")
  return A
end

@time A = readdlm_test()

This function takes ~15.43 seconds on first execution and ~15.1 seconds on subsequent calls of @time A=readdlm_test()

To test Fortran reading I used the following code:

PROGRAM fortran_vs_julia
  IMPLICIT NONE

  ! Parameters
  INTEGER, PARAMETER :: dbl=SELECTED_REAL_KIND(p=14,r=99)
  CHARACTER(9), PARAMETER :: in_file="input.txt"
  INTEGER, PARAMETER :: num_rows=19
  ! Integers
  INTEGER :: iost=0, j=0, num_cols=0, sum_values=0
  ! Reals
  REAL(KIND=dbl) :: begin_time, finish_time
  ! Arrays
  INTEGER, ALLOCATABLE, DIMENSION(:,:) :: values

  ! Store time at start of program
  CALL CPU_TIME(begin_time)

  ! Open and read in_file
  OPEN (UNIT=1, FILE=in_file, STATUS='old', ACTION='read', IOSTAT=iost)
    ! Read line 1 containing number of lines remaining to be read
    READ(1,*,IOSTAT=iost) num_cols
    ! Allocate array to store values
    ALLOCATE(values(1:num_rows,1:num_cols))
    ! Read in rest of file
    read_loop: DO j = 1,num_cols
      READ(1,*,IOSTAT=iost) values(1:19,j)
    END DO read_loop
  CLOSE(1)

  ! Sum the values in the array and print result
  sum_values = SUM(values)
  WRITE(*,*) "Sum of Values = ", sum_values

  ! Compute and print total CPU time
  CALL CPU_TIME(finish_time)
  WRITE(*,*) "CPU Time =", (finish_time - begin_time), " sec"

END PROGRAM

The Fortran program runs in ~5.44 seconds. To my surprise that’s about 3 times faster!!!

So is this just not something Julia is able to compete with Fortran on? Or am I doing this completely the wrong way? If the Julia code above is terrible then I’d suggest that the current documentation on I/O needs some additional pointers for how to fix/improve this.

I’m a huge fan of both languages but could definitely see moving more heavily to Julia if there’s a way to get comparable performance natively inside Julia. So any suggestions are most appreciated.

Thanks.

P.S.
You may notice I included a sum of the values in both codes. This was just to offer assurance that the data was read correctly. Omitting it changes the execution times very little as it’s obviously dominated by the reading.

It’s likely that the difference is that your Fortran code preallocates space, but readdlm must grow the space. Try providing dims to readdlm

Specifying dims as a tuple of the expected rows and columns (including header, if any) may speed up reading of large files.

1 Like

I tried writing

n = 2_000_000
A = [j for i in 1:n, j in 1:19]
global f = open("input2.txt", "w")
write(f, "$n\n")
write(f, A)
close(f)

and reading

function read_test()
    f = open("input2.txt", "r")
    readline(f) # skip this line => not needed for Julia
    A = read(f)
    SumValues = sum(A)
    println("Sum of values = $SumValues")
    return A
end
  
@btime A = read_test()

yielding

135.437 ms (29 allocations: 289.92 MiB)

BTW: the original benchmark was run on HDD or SSD?

readdlm has the advantage of being included in Julia (in DelimitedFiles, anything there is not fastest way or most featureful).

My understanding is CSV.jl is a replacement, and fastest library for such compared to all languages out there, at least those compared to (for larger files):

CSV.jl has some startup-cost (that can be avoided, long story). After the initial overhead, it’s fastest, and depending on the size of the file, you will eat that overhead. I’m not sure where the cross-over-point is (might be data-dependent, not entirely fixed), but I would at least consider that package.

DelimitedFiles is I suppose good to have, for (much) smaller files.

Another thing, Julia strings assume UTF-8, ok, since ASCII is a subset of, but if you know you only have ASCII, then there are other datatypes that can help, but that seems premature (optimization) to look into.

1 Like

That’s interesting @goerch - I’ll give that a try and see what I get. For this, I was running on a Mac Mini with SSD, 64 GB DDR4, and 6-core i7.

Fwiw I try the native arm version on m1 max and get 3s (compared to 16s on x86 1.7.1 via Rosetta)

1 Like

Thanks for the info. I’m asking because the original benchmark btimes

  5.455 s (114065956 allocations: 3.49 GiB)

for me on

julia> versioninfo()
Julia Version 1.9.0-DEV.167
Commit f5d15571b3 (2022-03-11 17:10 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 12 × Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 5 on 12 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 5

That looks like quite a difference to your Mac.

@goerch So yeah, that method is MUCH faster… it ran in 0.155 seconds on my Mac mini.

However, the input2.txt file is no longer an ASCII file (or at least it’s not showing as one in VI on my Mac)… so that’s not totally apples to apples, and a lot of the mesh files I deal with still come in an ASCII format… though this is definitely worth considering make some changes to our workflow there as well

Is there a tool in Julia to convert an entire file to/from ASCII text?

@LaurentPlagne your results are also very interesting with the M1 … I’ve got a MacBook Air with the M1 but haven’t done much benchmarking with it… may have to test it out

1 Like

@goerch I edited my script to use the benchmark tools so now it looks like:

using DelimitedFiles
using BenchmarkTools

function readdlm_test()
  f = open("input.txt", "r")
  readline(f) # skip this line => not needed for Julia
  A = readdlm(f, Int, comments=false)
  SumValues = sum(A)
  println("Sum of values = $SumValues")
  return A
end

@btime A = readdlm_test()

When I run this, I get the following:

% julia julia_vs_Fortran.jl
Sum of values = 380000000
Sum of values = 380000000
Sum of values = 380000000
Sum of values = 380000000
  14.122 s (402065446 allocations: 10.64 GiB)

Here’s my version info:

Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin19.5.0)
  CPU: Intel(R) Core(TM) i7-8700B CPU @ 3.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)

The btime is slightly better but still much slower than the Fortran code. I see you’re running the 1.9dev version, but I wouldn’t think that should make that much difference for this.

1 Like

Would agree, but tested:

  20.497 s (402065446 allocations: 10.64 GiB)
julia> versioninfo()
Julia Version 1.7.2
Commit bf53498635 (2022-02-06 15:21 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 5

The MWE is allocating quite a bit more on 1.7.2. This is either a performance bug or an impressive improvement in 1.9.0.

1 Like

More data points from my computer

  6.080 s (114065963 allocations: 3.49 GiB)
Julia Version 1.6.5
Commit 9058264a69 (2021-12-19 12:30 UTC)        
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-11.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 5

and

  6.367 s (114065956 allocations: 3.49 GiB)
Julia Version 1.8.0-beta1
Commit 7b711ce699 (2022-02-23 15:09 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 12 × Intel(R) Core(TM) i7-10710U CPU @ 1.10GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-13.0.1 (ORCJIT, skylake)
  Threads: 5 on 12 virtual cores
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS = 5
1 Like

Very nice! Ironically, I only just upgraded to 1.7 this week! LOL

Sorry @Jeff_Emanuel, I missed your suggestion earlier. But to follow up for the record, yes, specifying dims does help. I added the following lines to the script, which shaved a few seconds off the 1.7 read times.

n = parse(Int, readline(f))
A = readdlm(f, Int, dims=(n,19), comments=false)

@goerch I’d be curious what you get when you set dims on 1.6 and 1.9?

  2.567 s (66 allocations: 381.47 MiB)
Julia Version 1.6.5
  16.629 s (287999555 allocations: 7.53 GiB)
Julia Version 1.7.2
  2.414 s (65 allocations: 381.47 MiB)
Julia Version 1.8.0-beta1
  2.322 s (65 allocations: 381.47 MiB)
Julia Version 1.9.0-DEV.167

I’m not sure if it is worth the effort to file an issue, because 1.8.0 is soon to be released and LTS looks fine?

6 Likes

Yeah, I’m fine either way. And I’m very excited to see speeds exceeding Fortran’s read! :smile:

2 Likes