CSV.read very slow when number of threads changed

I want to read in a csv file with ~5000 lines which is usually very fast. But when I start Julia with --threads=10 or --threads=auto waiting times for CSV.read are going through the roof.
So, how can I have both, fast CSV.read and multithreading that deserves the name?

Could you clarify what “going through the roof” means?

For me on 1.10beta2 with one thread:

julia> using CSV, DataFrames

julia> CSV.write("test.csv", DataFrame(rand(5_000, 30), :auto));

julia> @time CSV.read("test.csv", DataFrame);
  0.700708 seconds (359.52 k allocations: 25.033 MiB, 3.32% gc time, 97.74% compilation time: 95% of which was recompilation)

julia> @time CSV.read("test.csv", DataFrame);
  0.017480 seconds (11.53 k allocations: 1.492 MiB)

and with --threads=auto (which means 8 for my machine):

julia> using CSV, DataFrames

julia> @time CSV.read("test.csv", DataFrame);
  2.103722 seconds (3.27 M allocations: 227.861 MiB, 3.96% gc time, 682.92% compilation time: 6% of which was recompilation)

julia> @time CSV.read("test.csv", DataFrame);
  0.008714 seconds (17.77 k allocations: 1.722 MiB)

so 0.7 → 2.1 seconds for the first run, and about a 50% speedup for the second (I guess the file is too small to benefit more from 8 threads).

1 Like

Hm, I’ve repeated your inputs and got quite similar results, when I’m calling Julia in a separate terminal.
I suspect the problem now to be with Pumas for Desktop. Because these long loading times only occur when I’m using Julia and CSV.read in VSCode which is linked to Pumas-2.4.1.app. Maybe I need to ask over at discourse.pumas.ai.

1 Like