I want to read in a csv file with ~5000 lines which is usually very fast. But when I start Julia with --threads=10
or --threads=auto
waiting times for CSV.read
are going through the roof.
So, how can I have both, fast CSV.read
and multithreading that deserves the name?
Could you clarify what “going through the roof” means?
For me on 1.10beta2 with one thread:
julia> using CSV, DataFrames
julia> CSV.write("test.csv", DataFrame(rand(5_000, 30), :auto));
julia> @time CSV.read("test.csv", DataFrame);
0.700708 seconds (359.52 k allocations: 25.033 MiB, 3.32% gc time, 97.74% compilation time: 95% of which was recompilation)
julia> @time CSV.read("test.csv", DataFrame);
0.017480 seconds (11.53 k allocations: 1.492 MiB)
and with --threads=auto
(which means 8 for my machine):
julia> using CSV, DataFrames
julia> @time CSV.read("test.csv", DataFrame);
2.103722 seconds (3.27 M allocations: 227.861 MiB, 3.96% gc time, 682.92% compilation time: 6% of which was recompilation)
julia> @time CSV.read("test.csv", DataFrame);
0.008714 seconds (17.77 k allocations: 1.722 MiB)
so 0.7 → 2.1 seconds for the first run, and about a 50% speedup for the second (I guess the file is too small to benefit more from 8 threads).
1 Like
Hm, I’ve repeated your inputs and got quite similar results, when I’m calling Julia in a separate terminal.
I suspect the problem now to be with Pumas for Desktop. Because these long loading times only occur when I’m using Julia and CSV.read
in VSCode which is linked to Pumas-2.4.1.app
. Maybe I need to ask over at discourse.pumas.ai.
1 Like