Yes, if you save as a CSV file with FileIO.jl, it will use CSVFiles.jl under the hood. TextParse.jl is actually not involved in that case, it only does the reading of files, I rolled the writing part of CSVFiles.jl myself.
Do I read the chart there correctly that the CSV writing stuff in CSVFiles.jl is the fastest way to write CSV files in julia right now? Yay I’m actually quite surprised that it is not way, way slower than the various binary options like Feather.jl, JLD.jl etc (yes, it is slower, but not orders of magnitude).
Now, the fwrite performance of course is crazy… How many cores do you have on your machine? I read the blog post how they do it, and I don’t think we could implement that kind of strategy with the current julia, we would really need a much stronger threading support…
I haven’t run the benchmarks. But fwrite makes use of your cores, so the more you have, the faster things should get. And yet, 4 cores is not that many, so it just seems really well done…
Another interesting test would be fwrite with the nThread=1 option. That would switch off the use of multiple cores and would give us an idea how far we are away from a really fast serial implementation.
Of course Julia’s threading isn’t as well developed but from I can see, it feels like Julia can implement some of it using IOBuffer? I have never done any of these but this is what I have in my mind:
The blogpost mentions writing N independent buffers and then writing out to disk once all buffers have finished writing sequentially
I think this can be simulated in Julia using this pseudo-code. I actually don’t know the right Julia syntax here
vio = Vector{IOBuffer}(nthreads())
# break "work" into chunks so that each chunk contains `nthreads()` pieces of work
work_chunks = breakup(work)
csvfile = open_file("path/to/out.csv")
for wc in work_chunks
@threads for i=1:nthreads()
local_io_buffer = new(IOBuffer())
# write to local_io_buffer until full
write_to_buffer!(local_io_buffer, wc[threadid()]
vio[threadid()] = local_io_buffer
end
# by here each thread would have done some work; it could be the case that 1 thread has done two pieces of work but should be extremely rare
write2csv(csvfile, vio)
end
The threads will take care of writing to its own buffer and there is a serial part to write it all out in order. This seems to be the approach mentioned in the post.
If the above was turned into proper Julia code, it might work. There is no obvious reason why it shouldn’t, I think; now it’s up to someone to spend the time to try…
I’d be surprised if that construct gave the same performance characteristics that are described in the blog post. The OpenMP ordered structure is quite different from what you suggest above, and I believe quite a bit more efficient. My understanding is that the @threads macro really is best used with loops that have way more elements than you have cores, and then it distributes those loops over the cores. I’d be surprised if @threads performed well if you use it with loops that have as many elements as you have threads.
I don’t think we need a thread safe IO system for that algorithm, the clue is that OpenMP in that example makes sure all IO is serialized. But we would need a richer threading story that supports more of the advanced OpenMP like stuff.
I think the more worrisome part for writing out CSV data, is that converting numbers to strings is not thread safe. The grisu code has a couple of things that would need to be locked, or have per-thread copies: const DIGITS = Vector{UInt8}(uninitialized, 309+17)
and const BIGNUMS = [Bignums.Bignum(),Bignums.Bignum(),Bignums.Bignum(),Bignums.Bignum()]
The grisu DIGITS buffer also seems to be reused in the Base printf code.