Here is 3 D array.
TD=rand(150,27,55)
I have saved ‘TD’ in a various way as follow.
- DataFrame(λ_SPS=collect(eachslice(λ_SPS,dims=3)))
CSV.write(“TD.csv”, TD)
- DataFrame(λ_SPS=collect(eachslice(λ_SPS,dims=1)))
CSV.write(“TD.csv”, TD)
- writedlm(“TD.txt”, TD)
The memory and speed of the three methods are as follows. (using @btime CSV.read(“data”,df))
- 24.28 MiB, 85.586 ms
- 15.03 MiB, 48.067 ms
- 1.87 MiB, 22.547 ms
Results:
- Reading the .txt using ‘CSV.read’, the process was most efficiency. (readdlm was terrible)
- As down the size of the matrix in dataframe, process speed was getting better. the number of row was not important comparing to size of the matrix.
I tested various method, but i couldn’t find the best way, yet. please share your experience.
You can dump the array to disk using JLD.jl or JLD2.jl, you don’t need to convert to a table and write as csv. There are other alternative packages like BSON.jl etc that you can use to store arrays.
Julia standard library’s Serialization takes 2 ms to write, less than 0.5 ms to read and fills 1.7 MB on disk, on my small Win11 laptop, but not sure if the experts would recommend it:
using Serialization
TD = rand(150,27,55)
serialize("TD_serialize.bin", TD); # 2.0 ms (24 allocs: 1.8 KiB)
TD2 = deserialize("TD_serialize.bin") # 463 μs (31 allocs: 1.7 MiB)
TD == TD2 # true
3 Likes
Very nice @rafael.guerra , thanks for sharing this stdlib, I will consider it next time.
1 Like
It’s worth remembering the Julia documentation:
The data format can change in minor (1.x) Julia releases, but files written by prior 1.x versions will remain readable.
In many contexts, this won’t be an issue, but it should be kept in mind.
2 Likes
Oh, I forgot wirte the thanks comments.
It was really helpful and improve dataread performance 200 times!
Thank you Rafael.
1 Like