I’m looking to speed up serialization of arrays that allow for missing values. Performance falls by an order of magnitude according to the following test:
N = Int(1e8);
buffer = Vector{UInt8}(undef, 8 * N);
io = IOBuffer(buffer, write=true)
test_cases = [
("BitArray of trues", BitArray(true for i in 1:N)),
("Array of zeros (Int)", zeros(Int, N)),
("Uninitialized Vector{Int}", Vector{Int}(undef, N)),
("Array of zeros (Union{Int, Float64})", zeros(Union{Int, Float64}, N)),
("Array of zeros (Union{Int, Missing})", zeros(Union{Int, Missing}, N)),
("Array of missings (Missing)", Vector{Missing}(missing, N)),
("Array of missings (Union{Int, Missing})", Vector{Union{Int, Missing}}(missing, N)),
("Uninitialized Vector{Union{Int, Float64}}", Vector{Union{Int, Float64}}(undef, N))
]
for (desc, arr) in test_cases
println("\n$desc:")
empty!(buffer)
@time Serialization.serialize(io, arr)
end
My results:
BitArray of trues:
0.003427 seconds (24 allocations: 1.516 KiB)
Array of zeros (Int):
0.354058 seconds (20 allocations: 351.369 MiB, 18.31% gc time)
Uninitialized Vector{Int}:
0.459178 seconds (20 allocations: 651.290 MiB, 13.53% gc time)
Array of zeros (Union{Int, Float64}):
0.535546 seconds (20 allocations: 512.001 MiB, 5.32% gc time)
Array of zeros (Union{Int, Missing}):
0.516552 seconds (20 allocations: 512.001 MiB, 2.61% gc time)
Array of missings (Missing):
0.000015 seconds (18 allocations: 1.422 KiB)
Array of missings (Union{Int, Missing}):
6.071769 seconds (100.00 M allocations: 1.990 GiB, 6.16% gc time)
Uninitialized Vector{Union{Int, Float64}}:
2.304365 seconds (100.00 M allocations: 1.990 GiB, 15.80% gc time)
I found code here that show how I can override serialization for a type.
using Serialization
# The target struct
struct Foo
x::Int
y::Union{Int, Nothing} #we do not want to serialize this field
end
# Custom Serialization of a Foo instance
function Serialization.serialize(s::AbstractSerializer, instance::Foo)
Serialization.writetag(s.io, Serialization.OBJECT_TAG)
Serialization.serialize(s, Foo)
Serialization.serialize(s, instance.x)
end
# Custom Deserialization of a Foo instance
function Serialization.deserialize(s::AbstractSerializer, ::Type{Foo})
x = Serialization.deserialize(s)
Foo(x,nothing)
end
foo1 = Foo(1,2)
# Serialization
write_iob = IOBuffer()
serialize(write_iob, foo1)
seekstart(write_iob)
content = read(write_iob)
# Deserialization
read_iob = IOBuffer(content)
foo2 = deserialize(read_iob)
@show foo1
@show foo2
But what I would really like to do is inspect if an array is of Union type with a Missing type parameter (I’d use a bitarray to flag missing entries), and pass it to the default implementation if I can’t handle the structure. I’m not sure how I can intercept the serialization for AbstractVector
, and still have the default implementation that I can use as fallback. I’m also not sure if I can easily subclass AbstractSerializer
, or how I can organize a custom serializer.
Thanks for any help you can offer!