Very slow readdlm()

programista · September 15, 2018, 2:46pm

Julia 0.6 file 500MB (only digits, linux stnadard ) on Win by readcsv() is reading in ±150 sek. Julia 0.7 has only readdlm(), The same file was reading over 2 h! What to do ? Is any package to conver file linux>win ? It is big back step for Win useres .
Paul

kristoffer.carlsson · September 15, 2018, 2:47pm

https://github.com/JuliaLang/julia/issues/29036

Fix will be in 1.0.1.

programista · September 15, 2018, 2:49pm

Nice, THX! Paul

LeoK987 · September 15, 2018, 11:46pm

Haven’t tried to use the CSV package?

programista · September 16, 2018, 6:04pm

Thanks, is one step mre , convert DataFrame to Array, but now it is ok

programista · October 2, 2018, 9:38am

On 1.0.1 no better
Julia 0.6.2
julia> @time d=readdlm(“plik.txt”,‘,’)
8.264069 seconds (9.32 M allocations: 347.220 MiB …

Julia 1.0.1
julia> @time d=readdlm(“plik.txt”,‘,’)
54.991627 seconds (11.14 M allocations: 436.138 MiB

size of plik.txt = 56 456 KB

kristoffer.carlsson · October 2, 2018, 1:34pm

It would be good if you could provide a file which shows the slowdown.

programista · October 2, 2018, 3:48pm

kristoffer.carlsson · October 2, 2018, 3:54pm

Are you sure you upgraded to 1.0.1?

I get:

1.0.1

julia> @time d=readdlm("plik.txt",',');
  1.997139 seconds (9.00 M allocations: 330.061 MiB, 11.14% gc time)

julia> versioninfo()
Julia Version 1.0.1

0.6

julia> @time d=readdlm("plik.txt",',');
  2.057267 seconds (9.01 M allocations: 330.109 MiB, 5.33% gc time)

julia> versioninfo()
Julia Version 0.6.5-pre.0

programista · October 2, 2018, 3:59pm

THX,
Yes. I am shure, but now is OK, about 7 sec. Propably another Task on my machine was lunched, sorry
Paul

Ajaychat3 · October 2, 2018, 6:53pm

@kristoffer.carlsson: I don’t understand one thing. The file size is about 56mb but the memory allocation is 330mb. Curious to know the reason if possible.

kristoffer.carlsson · October 2, 2018, 7:02pm

The memory allocation is the total amount allocated during the execution of the function, not the memory usage when the function returns.

Ajaychat3 · October 2, 2018, 7:09pm

So it means 330mb was allocated for reading the file. Or it is the maximum memory being allocated to entire Julia session when the file reading was in progress. Sorry if this appears to be a basic question.

kristoffer.carlsson · October 2, 2018, 7:31pm

It means that if you sum all allocations that happened during the reading of the file, you get 330 MB.

Ajaychat3 · October 2, 2018, 7:33pm

Thanks

Topic		Replies	Views
CSV vs DelimitedFiles vs Numpy Performance	15	974	January 20, 2024
CSV.read extremely slow wrt readtable Data	14	3638	July 27, 2018
CSV read in is too slow than other language General Usage performance	13	1363	June 21, 2023
Simple benchmark for CVS.jl v0.9.11 in Julia 1.6.4 Performance	1	544	November 26, 2021
Most of the time spent in `readdlm` a txt file Performance io	1	597	August 9, 2021

Very slow readdlm()

Related topics