CSV.read very slow when number of threads changed

pascal · September 18, 2023, 3:01pm

I want to read in a csv file with ~5000 lines which is usually very fast. But when I start Julia with --threads=10 or --threads=auto waiting times for CSV.read are going through the roof.
So, how can I have both, fast CSV.read and multithreading that deserves the name?

nilshg · September 18, 2023, 3:16pm

Could you clarify what “going through the roof” means?

For me on 1.10beta2 with one thread:

julia> using CSV, DataFrames

julia> CSV.write("test.csv", DataFrame(rand(5_000, 30), :auto));

julia> @time CSV.read("test.csv", DataFrame);
  0.700708 seconds (359.52 k allocations: 25.033 MiB, 3.32% gc time, 97.74% compilation time: 95% of which was recompilation)

julia> @time CSV.read("test.csv", DataFrame);
  0.017480 seconds (11.53 k allocations: 1.492 MiB)

and with --threads=auto (which means 8 for my machine):

julia> using CSV, DataFrames

julia> @time CSV.read("test.csv", DataFrame);
  2.103722 seconds (3.27 M allocations: 227.861 MiB, 3.96% gc time, 682.92% compilation time: 6% of which was recompilation)

julia> @time CSV.read("test.csv", DataFrame);
  0.008714 seconds (17.77 k allocations: 1.722 MiB)

so 0.7 → 2.1 seconds for the first run, and about a 50% speedup for the second (I guess the file is too small to benefit more from 8 threads).

pascal · September 18, 2023, 3:53pm

Hm, I’ve repeated your inputs and got quite similar results, when I’m calling Julia in a separate terminal.
I suspect the problem now to be with Pumas for Desktop. Because these long loading times only occur when I’m using Julia and CSV.read in VSCode which is linked to Pumas-2.4.1.app. Maybe I need to ask over at discourse.pumas.ai.

Topic		Replies	Views
CSV read in is too slow than other language General Usage performance	13	1358	June 21, 2023
Read csv files slow Performance filesystem	13	1705	July 28, 2020
Multithreaded CSV writes Performance multithreading , csv	20	3459	April 14, 2023
Making string to float conversion faster? General Usage	16	1110	March 14, 2021
Reading and processing Data files concurrently Data parallel	18	3800	September 20, 2017

CSV.read very slow when number of threads changed

Related topics