Improving serialization time


#1

I have a object which is essentially a = [[(Int64(10), Int32(2)) for i = 1:500] for i = 1:102661]. Serializing it takes 1.9 seconds. This has become a major bottleneck in my program because it has to be down repeatedly. Is there anything I can do to reduce the time spent on serialization?


#2

Can you post a MWE?

Also more details.
Are you serializing with JLD? JLD2?
Base.serialize
something else?


#3
julia> a = [[(Int64(10), Int32(2)) for i = 1:500] for i = 1:102661]
julia> buffer = IOBuffer()
julia> @time serialize(buffer, a)
  1.482290 seconds (395.62 k allocations: 1.012 GiB, 5.20% gc time)

My apologies for the incomplete information. Here is an MWE.

I am using Base.serialize.

Just a little more about the application. I have a parallel computing application written in C++ with Julia embedded. There are many processes that form a “ring” topology. Essentially each process holds a partition of that a matrix, applies some updates to its partition by processing its local data and then sends its a partition to the next process in the ring. This process has to repeat many times.


#4

Would I likely reduce the serialization time if I use a different serializer or different data structure to store the same data?


#5

No, Base.serialize is generally faster than any other serializers.

Using a different Data Structure might help (If you were using a different serialize it would be easy to know if it would or not; but they would be slower than Base.serialze anyway)


#6

Are your processes on the same machine? Are you sending the data from julia memory to C?

For many julia processes on the same machine, using SharedArrays might be better than using serialisation.

As far as possible, you want to transfer the memory as-is. How to do that efficiently is unfortunately highly application dependent.


#7

Thanks! Not all are on the same machine but many are. My understanding is that in one process I may have only one thread that calls Julia functions. Otherwise, I could just use shared memory among my workers.


#8

Blockquote My understanding is that in one process I may have only one thread that calls Julia functions. Otherwise, I could just use shared memory among my workers.

I could be misunderstanding what you mean, but I thought Julia had multi-threading.