Abysmal performance when reading block of data from disk with Julia

See the code below.

const count = 10000

function create()
    v = rand(Int64, count)
    open("data.bin", "w") do ofile
        write(ofile, v)
    end
end

function test()
    open("data.bin") do ifile
        v = Vector{Int64}(undef, len)
        v = read!(ifile, v)
        return v
    end
end

This is a very simple, arguably trivial, thing to want to do. Other languages cope perfectly fine with this. For example, the runtime performance of the same code written in C++ is over 10,000 times faster.

  • C++ runtime: few nanoseconds
  • Julia runtime: nearly a whole millisecond

I can’t believe how bad it is. It’s a joke. It’s a totally trivial operation.

  1. Read block of data using OS call
  2. Perform O(1) reinterpret of returned data so that the runtime treats it as a Vector{Int64} instead of Vector{UInt8}. There is no need for this to do any type-safety or runtime type-checking nonsense.

Since the OS is doing most of the work here, there is no excuse for this.

This code doesn’t run as written. Assuming you meant to write,

function test()
    open("data.bin") do ifile
    v = Vector{Int64}(undef, div(count,8))
        v = read!(ifile, v)
        return v
    end
end

I’m seeing a time of 7.2 microseconds. Do you have the C++ code to show the few nanosecond timing? I’m surprised by this given that NVME drives have a few microsends latency.

2 Likes

Turn your GC on

GC is on. This doesn’t allocate anywhere near enough for the GC to run every time.

julia> @benchmark test()
BenchmarkTools.Trial: 10000 samples with 4 evaluations per sample.
 Range (min … max):  6.963 ΞΌs …  2.388 ms  β”Š GC (min … max): 0.00% … 91.97%
 Time  (median):     7.928 ΞΌs              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   9.896 ΞΌs Β± 35.216 ΞΌs  β”Š GC (mean Β± Οƒ):  8.37% Β±  2.67%

     β–…β–ˆβ–ˆβ–†β–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–β–β–       ▁▂▂▁▁▁ ▁▁▁ ▁                 β–‚
  β–…β–„β–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–ˆβ–†β–‡β–‡β–†β–‡β–‡β–‡β–…β–†β–„β–…β–… β–ˆ
  6.96 ΞΌs      Histogram: log(frequency) by time     16.9 ΞΌs <

 Memory estimate: 10.57 KiB, allocs estimate: 16.

I strongly suspect that the your C++ compiler optimized away all of your I/O code.

5 Likes

You’re the one asking for assistance in the first place, not being paid for that is expected everywhere. We all participate for free and try to get along here. As for the work itself, you’re not doing enough for reproducibility or profiling. Provide all the code, including the benchmarking.

1 Like

Can you post the C++ code that you’re benchmarking against? I’m really interested to see how C++ is doing this in nanoseconds.

2 Likes

10,000 64-bit integers read in a few nanoseconds. Depending on what β€˜a few nanoseconds’ means, that’s fast.

  • If it means 1000 nanoseconds, it is 80GB/sec
  • If it means 100 nanoseconds, it is 800GB/sec
  • If it means 10 nanoseconds, it is 8TB/sec

My own SSD can maybe do a few GB/sec at most.

4 Likes

Running this code:

function readtest!(v, filename)
    open(filename, "r") do fid
        read!(fid, v)
    end
    return v
end

with a pre-allocated v, I get a runtime of 43us, which translates to 1.7GB/sec, reasonably close to what I belive to be my disk read speed.

Note that caching is likely to play some role here, not sure how to benchmark this in a bulletproof way, but something like that appears to be a problem in your C++ code as well, I would guess.

Specifically, dual channel DDR5 ram is only ~100GB/s so it’s somewhat unbelievable to me that this could be finishing in less than ~800 microseconds.

1 Like

nanoseconds?

yes.