I am creating a function to write binary data (a data array) to a file.
First, I have just reversed the reading function (which reads data almost one byte at a time) and performed a sequential write, but noticed it took a surprisingly long time to finish.
Then I tried bulk writing the whole array in one call.
This made it orders of magnitude faster. However for larger files (>100MB) it also experienced a slowdown.
So in the third iteration it experimented with batch write of different sizes.
The results of benchmarking are below:
You can see that above 100MB, the batch approach seems to be better by almost a magnitude.
This pattern repeats on different types of drives. And it doesn’t seem to me it could be simply an effect of caching on a drive.
I assume that difference between sequential and bulk stems from the overhead of write function calls, but I was wondering if this batch speed up is something typical for Julia / computers in general or if I am doing it not in an optimal way.
Maybe someone here has any ideas, suggestions?
Tests were done on Julia 1.7.2 and Windows 11.
The code for MWE is below:
using BenchmarkTools function write_seq(array) fid = open("./test.dat", "w") for i in array write(fid, i) end close(fid) end function write_bulk(array) fid = open("./test2.dat", "w") write(fid, array) close(fid) end function write_bulk_batch(array, batch) fid = open("./test3.dat", "w") iter = Int(length(array) / batch) for i=1:iter write(fid, array[1+batch*(iter-1):batch*iter]) end close(fid) end sizes = [ 1_000, 10_000, 100_000, 1_000_000 ] batches = [ 1_000, 10_000, 100_000 ] seq =  bulk =  bulkb = [,,] for asize in sizes array = rand(UInt8, asize) s = @benchmark write_seq($array) seconds=10 push!(seq, median(s).time) b = @benchmark write_bulk($array) seconds=10 push!(bulk, median(b).time) for (i, batch) in enumerate(batches) if batch >= asize push!(bulkb[i], 0) else bb = @benchmark write_bulk_batch($array, $batch) seconds=10 push!(bulkb[i], median(bb).time) end end end