Abysmal performance when reading block of data from disk with Julia

world-peace · January 20, 2025, 8:39pm

See the code below.

const count = 10000

function create()
    v = rand(Int64, count)
    open("data.bin", "w") do ofile
        write(ofile, v)
    end
end

function test()
    open("data.bin") do ifile
        v = Vector{Int64}(undef, len)
        v = read!(ifile, v)
        return v
    end
end

This is a very simple, arguably trivial, thing to want to do. Other languages cope perfectly fine with this. For example, the runtime performance of the same code written in C++ is over 10,000 times faster.

C++ runtime: few nanoseconds
Julia runtime: nearly a whole millisecond

I can’t believe how bad it is. It’s a joke. It’s a totally trivial operation.

Read block of data using OS call
Perform O(1) reinterpret of returned data so that the runtime treats it as a Vector{Int64} instead of Vector{UInt8}. There is no need for this to do any type-safety or runtime type-checking nonsense.

Since the OS is doing most of the work here, there is no excuse for this.

Oscar_Smith · January 20, 2025, 8:48pm

This code doesn’t run as written. Assuming you meant to write,

function test()
    open("data.bin") do ifile
    v = Vector{Int64}(undef, div(count,8))
        v = read!(ifile, v)
        return v
    end
end

I’m seeing a time of 7.2 microseconds. Do you have the C++ code to show the few nanosecond timing? I’m surprised by this given that NVME drives have a few microsends latency.

world-peace · January 20, 2025, 8:55pm

Turn your GC on

Oscar_Smith · January 20, 2025, 9:04pm

GC is on. This doesn’t allocate anywhere near enough for the GC to run every time.

julia> @benchmark test()
BenchmarkTools.Trial: 10000 samples with 4 evaluations per sample.
 Range (min … max):  6.963 μs …  2.388 ms  ┊ GC (min … max): 0.00% … 91.97%
 Time  (median):     7.928 μs              ┊ GC (median):    0.00%
 Time  (mean ± σ):   9.896 μs ± 35.216 μs  ┊ GC (mean ± σ):  8.37% ±  2.67%

     ▅██▆▄▃▃▃▃▃▃▂▂▂▂▂▂▁▁▁       ▁▂▂▁▁▁ ▁▁▁ ▁                 ▂
  ▅▄▆█████████████████████████████████████████▆█▆▇▇▆▇▇▇▅▆▄▅▅ █
  6.96 μs      Histogram: log(frequency) by time     16.9 μs <

 Memory estimate: 10.57 KiB, allocs estimate: 16.

abraemer · January 20, 2025, 9:06pm

I strongly suspect that the your C++ compiler optimized away all of your I/O code.

Benny · January 20, 2025, 9:35pm

You’re the one asking for assistance in the first place, not being paid for that is expected everywhere. We all participate for free and try to get along here. As for the work itself, you’re not doing enough for reproducibility or profiling. Provide all the code, including the benchmarking.

Oscar_Smith · January 20, 2025, 9:36pm

Can you post the C++ code that you’re benchmarking against? I’m really interested to see how C++ is doing this in nanoseconds.

DNF · January 20, 2025, 10:01pm

10,000 64-bit integers read in a few nanoseconds. Depending on what ‘a few nanoseconds’ means, that’s fast.

If it means 1000 nanoseconds, it is 80GB/sec
If it means 100 nanoseconds, it is 800GB/sec
If it means 10 nanoseconds, it is 8TB/sec

My own SSD can maybe do a few GB/sec at most.

DNF · January 20, 2025, 10:16pm

Running this code:

function readtest!(v, filename)
    open(filename, "r") do fid
        read!(fid, v)
    end
    return v
end

with a pre-allocated v, I get a runtime of 43us, which translates to 1.7GB/sec, reasonably close to what I belive to be my disk read speed.

Note that caching is likely to play some role here, not sure how to benchmark this in a bulletproof way, but something like that appears to be a problem in your C++ code as well, I would guess.

Oscar_Smith · January 20, 2025, 10:24pm

Specifically, dual channel DDR5 ram is only ~100GB/s so it’s somewhat unbelievable to me that this could be finishing in less than ~800 microseconds.

DNF · January 20, 2025, 10:26pm

nanoseconds?

Oscar_Smith · January 20, 2025, 10:31pm

yes.

simsurace · January 21, 2025, 8:08am

I strongly suspect that the your C++ compiler optimized away all of your I/O code.

Thinking about this for a bit, wouldn’t this be a very bold move and circular reasoning on the part of the compiler (even if the program removes the file afterwards)?

In any case, I think seeing the C++ program that is being compiled and benchmarked would shed some light onto what’s going on.

abraemer · January 21, 2025, 2:00pm

I don’t quite follow. My comment was also a bit short and imprecise. Trying to be a bit more precise: I think if the compiler realizes that the result of the reading operation is completely unused it might just optimize the actual read away. Opening the file might remain (because it could throw) and that could explain why the call still takes “a few nanoseconds”.

But as you say: Without knowing the code that was actually timed, it is impossible to say what is going on.

foobar_lv2 · January 21, 2025, 4:13pm

Something that can go in nanoseconds is simply mmapping the file, esp because it’s O(1); like “reading” a gigabyte file in mere microseconds.

Is this mayhaps what you’re comparing to @world-peace ?

You can use mmap the same way in julia.

What you would need to look at for that is both C/C++ code and strace or similar output.

However, while mapping is often an appropriate alternative to reading a file, it’s very inappropriate to conflate both in benchmarks – if you e.g. modify the array, copy-on-write-on-pagefault is likely to be slower than actually reading the data in the first place.

Topic		Replies	Views
How fast is binary reading capabilities in Julia compared with other languages? Data binaryio	11	2119	April 23, 2019
Most of the time spent in `readdlm` a txt file Performance io	1	597	August 9, 2021
Some tweaks about binary I/O plus some conversions Data binaryio	4	860	June 18, 2020
CSV read in is too slow than other language General Usage performance	13	1372	June 21, 2023
File IO Buffers too small? Performance binaryio , io	14	1726	November 25, 2022

Abysmal performance when reading block of data from disk with Julia

Related topics