What is the most efficiency way to read the 3D matrix?

manu_han · April 6, 2023, 3:29pm

Here is 3 D array.
TD=rand(150,27,55)

I have saved ‘TD’ in a various way as follow.

DataFrame(λ_SPS=collect(eachslice(λ_SPS,dims=3)))
CSV.write(“TD.csv”, TD)
DataFrame(λ_SPS=collect(eachslice(λ_SPS,dims=1)))
CSV.write(“TD.csv”, TD)
writedlm(“TD.txt”, TD)

The memory and speed of the three methods are as follows. (using @btime CSV.read(“data”,df))

24.28 MiB, 85.586 ms
15.03 MiB, 48.067 ms
1.87 MiB, 22.547 ms

Results:

Reading the .txt using ‘CSV.read’, the process was most efficiency. (readdlm was terrible)
As down the size of the matrix in dataframe, process speed was getting better. the number of row was not important comparing to size of the matrix.

I tested various method, but i couldn’t find the best way, yet. please share your experience.

juliohm · April 6, 2023, 3:43pm

You can dump the array to disk using JLD.jl or JLD2.jl, you don’t need to convert to a table and write as csv. There are other alternative packages like BSON.jl etc that you can use to store arrays.

rafael.guerra · April 6, 2023, 6:41pm

Julia standard library’s Serialization takes 2 ms to write, less than 0.5 ms to read and fills 1.7 MB on disk, on my small Win11 laptop, but not sure if the experts would recommend it:

using Serialization
TD = rand(150,27,55)
serialize("TD_serialize.bin", TD);     # 2.0 ms (24 allocs: 1.8 KiB)
TD2 = deserialize("TD_serialize.bin")  # 463 μs (31 allocs: 1.7 MiB)
TD == TD2   # true

juliohm · April 8, 2023, 1:15pm

Very nice @rafael.guerra , thanks for sharing this stdlib, I will consider it next time.

magister-ludi · April 8, 2023, 1:31pm

It’s worth remembering the Julia documentation:
The data format can change in minor (1.x) Julia releases, but files written by prior 1.x versions will remain readable.
In many contexts, this won’t be an issue, but it should be kept in mind.

manu_han · April 12, 2023, 2:01am

Oh, I forgot wirte the thanks comments.
It was really helpful and improve dataread performance 200 times!
Thank you Rafael.

Topic		Replies	Views
Best practice for reading large matrix with repeated spaces as text General Usage dataframes , csv	8	516	April 20, 2023
Matrix handling in memory Performance	15	385	May 15, 2024
CSV read in is too slow than other language General Usage performance	13	1358	June 21, 2023
Fast iteration over rows of a DataFrame Performance	14	14161	June 30, 2020
Exporting and importing a 3D matrix General Usage hdf5 , array , jld2 , dataframes , csv	4	927	May 19, 2022

What is the most efficiency way to read the 3D matrix?

Related topics