Improving serialization time

jinliangwei · June 1, 2018, 2:47am

I have a object which is essentially a = [[(Int64(10), Int32(2)) for i = 1:500] for i = 1:102661]. Serializing it takes 1.9 seconds. This has become a major bottleneck in my program because it has to be down repeatedly. Is there anything I can do to reduce the time spent on serialization?

oxinabox · June 1, 2018, 2:51am

Can you post a MWE?

Also more details.
Are you serializing with JLD? JLD2?
Base.serialize
something else?

jinliangwei · June 1, 2018, 5:48am

julia> a = [[(Int64(10), Int32(2)) for i = 1:500] for i = 1:102661]
julia> buffer = IOBuffer()
julia> @time serialize(buffer, a)
  1.482290 seconds (395.62 k allocations: 1.012 GiB, 5.20% gc time)

My apologies for the incomplete information. Here is an MWE.

I am using Base.serialize.

Just a little more about the application. I have a parallel computing application written in C++ with Julia embedded. There are many processes that form a “ring” topology. Essentially each process holds a partition of that a matrix, applies some updates to its partition by processing its local data and then sends its a partition to the next process in the ring. This process has to repeat many times.

jinliangwei · June 1, 2018, 5:59am

Would I likely reduce the serialization time if I use a different serializer or different data structure to store the same data?

oxinabox · June 1, 2018, 1:31pm

No, Base.serialize is generally faster than any other serializers.

Using a different Data Structure might help (If you were using a different serialize it would be easy to know if it would or not; but they would be slower than Base.serialze anyway)

avik · June 1, 2018, 1:47pm

Are your processes on the same machine? Are you sending the data from julia memory to C?

For many julia processes on the same machine, using SharedArrays might be better than using serialisation.

As far as possible, you want to transfer the memory as-is. How to do that efficiently is unfortunately highly application dependent.

jinliangwei · June 1, 2018, 1:57pm

Thanks! Not all are on the same machine but many are. My understanding is that in one process I may have only one thread that calls Julia functions. Otherwise, I could just use shared memory among my workers.

Juser · June 2, 2018, 4:27am

Blockquote My understanding is that in one process I may have only one thread that calls Julia functions. Otherwise, I could just use shared memory among my workers.

I could be misunderstanding what you mean, but I thought Julia had multi-threading.

Topic		Replies	Views
De-Serialization Performance Data	12	1728	November 25, 2018
Matrix handling in memory Performance	15	381	May 15, 2024
Serialize() is prohibitively slow for recursive structs. What am I missing? New to Julia serialization , recursion	15	1819	February 14, 2022
JLD2 seems slow at write operations compared to serialize and HDF5 General Usage data	3	1167	November 20, 2017
Serialize or swap file? New to Julia question	4	468	August 4, 2020

Improving serialization time

Related topics