const count = 10000
function create()
v = rand(Int64, count)
open("data.bin", "w") do ofile
write(ofile, v)
end
end
function test()
open("data.bin") do ifile
v = Vector{Int64}(undef, len)
v = read!(ifile, v)
return v
end
end
This is a very simple, arguably trivial, thing to want to do. Other languages cope perfectly fine with this. For example, the runtime performance of the same code written in C++ is over 10,000 times faster.
C++ runtime: few nanoseconds
Julia runtime: nearly a whole millisecond
I can’t believe how bad it is. It’s a joke. It’s a totally trivial operation.
Read block of data using OS call
Perform O(1) reinterpret of returned data so that the runtime treats it as a Vector{Int64} instead of Vector{UInt8}. There is no need for this to do any type-safety or runtime type-checking nonsense.
Since the OS is doing most of the work here, there is no excuse for this.
This code doesn’t run as written. Assuming you meant to write,
function test()
open("data.bin") do ifile
v = Vector{Int64}(undef, div(count,8))
v = read!(ifile, v)
return v
end
end
I’m seeing a time of 7.2 microseconds. Do you have the C++ code to show the few nanosecond timing? I’m surprised by this given that NVME drives have a few microsends latency.
You’re the one asking for assistance in the first place, not being paid for that is expected everywhere. We all participate for free and try to get along here. As for the work itself, you’re not doing enough for reproducibility or profiling. Provide all the code, including the benchmarking.
function readtest!(v, filename)
open(filename, "r") do fid
read!(fid, v)
end
return v
end
with a pre-allocated v, I get a runtime of 43us, which translates to 1.7GB/sec, reasonably close to what I belive to be my disk read speed.
Note that caching is likely to play some role here, not sure how to benchmark this in a bulletproof way, but something like that appears to be a problem in your C++ code as well, I would guess.
I strongly suspect that the your C++ compiler optimized away all of your I/O code.
Thinking about this for a bit, wouldn’t this be a very bold move and circular reasoning on the part of the compiler (even if the program removes the file afterwards)?
In any case, I think seeing the C++ program that is being compiled and benchmarked would shed some light onto what’s going on.
I don’t quite follow. My comment was also a bit short and imprecise. Trying to be a bit more precise: I think if the compiler realizes that the result of the reading operation is completely unused it might just optimize the actual read away. Opening the file might remain (because it could throw) and that could explain why the call still takes “a few nanoseconds”.
But as you say: Without knowing the code that was actually timed, it is impossible to say what is going on.
Something that can go in nanoseconds is simply mmapping the file, esp because it’s O(1); like “reading” a gigabyte file in mere microseconds.
Is this mayhaps what you’re comparing to @world-peace ?
You can use mmap the same way in julia.
What you would need to look at for that is both C/C++ code and strace or similar output.
However, while mapping is often an appropriate alternative to reading a file, it’s very inappropriate to conflate both in benchmarks – if you e.g. modify the array, copy-on-write-on-pagefault is likely to be slower than actually reading the data in the first place.