CSV.jl's CSV write seems slow

Yes, if you save as a CSV file with FileIO.jl, it will use CSVFiles.jl under the hood. TextParse.jl is actually not involved in that case, it only does the reading of files, I rolled the writing part of CSVFiles.jl myself.

Do I read the chart there correctly that the CSV writing stuff in CSVFiles.jl is the fastest way to write CSV files in julia right now? Yay :slight_smile: I’m actually quite surprised that it is not way, way slower than the various binary options like Feather.jl, JLD.jl etc (yes, it is slower, but not orders of magnitude).

Now, the fwrite performance of course is crazy… How many cores do you have on your machine? I read the blog post how they do it, and I don’t think we could implement that kind of strategy with the current julia, we would really need a much stronger threading support…

I have 4 cores hyperthreaded. It’s a laptop high-end i7 CPU.

If you run the benchmark do you see the fwrite speeds that I quoted?

I haven’t run the benchmarks. But fwrite makes use of your cores, so the more you have, the faster things should get. And yet, 4 cores is not that many, so it just seems really well done…

Another interesting test would be fwrite with the nThread=1 option. That would switch off the use of multiple cores and would give us an idea how far we are away from a really fast serial implementation.

Actually Julia’s feather read and write are also slow. If given a choice would prefer to make those fast first!

1 Like

I think you are referring to this blog post?

Of course Julia’s threading isn’t as well developed but from I can see, it feels like Julia can implement some of it using IOBuffer? I have never done any of these but this is what I have in my mind:

The blogpost mentions writing N independent buffers and then writing out to disk once all buffers have finished writing sequentially

  • I think this can be simulated in Julia using this pseudo-code. I actually don’t know the right Julia syntax here
vio = Vector{IOBuffer}(nthreads())
# break "work" into chunks so that each chunk contains `nthreads()` pieces of work
work_chunks = breakup(work)
csvfile = open_file("path/to/out.csv")
for wc in work_chunks
  @threads for i=1:nthreads()
      local_io_buffer = new(IOBuffer())
      # write to local_io_buffer until full
      write_to_buffer!(local_io_buffer, wc[threadid()]
      vio[threadid()] = local_io_buffer
   end
   # by here each thread would have done some work; it could be the case that 1 thread has done two pieces of work but should be extremely rare
   write2csv(csvfile, vio)
end

The threads will take care of writing to its own buffer and there is a serial part to write it all out in order. This seems to be the approach mentioned in the post.

If the above was turned into proper Julia code, it might work. There is no obvious reason why it shouldn’t, I think; now it’s up to someone to spend the time to try…

2 Likes

I’d be surprised if that construct gave the same performance characteristics that are described in the blog post. The OpenMP ordered structure is quite different from what you suggest above, and I believe quite a bit more efficient. My understanding is that the @threads macro really is best used with loops that have way more elements than you have cores, and then it distributes those loops over the cores. I’d be surprised if @threads performed well if you use it with loops that have as many elements as you have threads.

I believe that we’ll be able to do something similar to the strategy described in the blog once we have something like WIP: parallel task runtime by kpamnany · Pull Request #22631 · JuliaLang/julia · GitHub in julia.

I’m pretty sure IO is not thread safe in julia.

So it’s not possible to achieve fwrite’s multithreaded speed then

I don’t think we need a thread safe IO system for that algorithm, the clue is that OpenMP in that example makes sure all IO is serialized. But we would need a richer threading story that supports more of the advanced OpenMP like stuff.

I think the more worrisome part for writing out CSV data, is that converting numbers to strings is not thread safe. The grisu code has a couple of things that would need to be locked, or have per-thread copies:
const DIGITS = Vector{UInt8}(uninitialized, 309+17)
and
const BIGNUMS = [Bignums.Bignum(),Bignums.Bignum(),Bignums.Bignum(),Bignums.Bignum()]
The grisu DIGITS buffer also seems to be reused in the Base printf code.

This is an open issue: Grisu (floating point printing) not thread-safe · Issue #25727 · JuliaLang/julia · GitHub

I have tried the OP example and this is what I get:

julia> using CSV
julia> @btime CSV.write(“df.csv”, df);
22.298 s (200000047 allocations: 4.47 GiB)

julia> using CSVFiles
julia> @btime save(“df2.csv”, df)
120.250 s (400000103 allocations: 17.88 GiB)

It’s really slow.

1 Like