How to dispatch this serializer function?

Oscar_Smith · August 7, 2021, 1:34pm

Yeah that’s not an issue. An array is just a struct.

Skoffer · August 7, 2021, 7:44pm

I haven’t followed discussions later in this thread, so sorry if I repeat something.

Returning to this snippet:

    # Just serialize each member.
    for ii = 1:numel(v)
        write(io, serializej2m(v[ii]))
    end

The problem of these lines is that type v[ii] can be unknown at the time of compilation, if for example v is defined as v = Any[1, 2, 3, 4].In this case, all dispatching will happen in runtime and this is bad, cause it’s relatively slow. The whole thing about type-stability is that if the compiler can calculate types during compile time, then it can write highly efficient code. If not then things can get ugly.

Union splitting is one way to overcome this problem, if you know possible set of types beforehand, you can write

...
if v[ii] isa UInt8
  write(io, serializej2m(v[ii]))
elseif v[ii] isa UInt16
  write(io, serializej2m(v[ii]))
...

What is going on here, is that during runtime, instead of running full runtime lookup of type and corresponding function, it will just make a pointer comparison (which is fast) and execute the corresponding branch.

Now, coincidently this code looks similar to the classToByte function in your original definition, but the thing is, it is used in different circumstances. If all you are trying to do is to dispatch (i.e. choose function) depending on the type of incoming object, then you should use multiple dispatch, because compiler can do it better then you. If you are in a situation when there is no way to avoid runtime dispatch, it may be useful to use union splitting.

With all that said, maybe it is not applicable in this situation, since I do not quite understand in which situation should serializej2m(io, v::AbstractArray) be used, so maybe union splitting is not applicable here.

Bardo · August 9, 2021, 8:33am

Your info comes handy to handle the single/array dispatch:

# Encode number types
utype(t) = Union{t, AbstractArray{t}}
type2byte(::T) where {T<:utype(Float64)}            = UInt8(0)
type2byte(::T) where {T<:Union{Any, AbstractArray}} = UInt8(255)  # fallback for struct

function _serialize_old(io, v::T) where {T<:Real} 
    println("processing 'single Real'")
    write(io, type2byte(v))
    write(io, UInt8(0))
    write(io, v)
end
function _serialize_old(io, v::T) where {T<:AbstractArray{<:Real}}
    println("processing 'Array of Real'")
    write(io, type2byte(v))
    write(io, UInt8(ndims(v)))
    write.(Ref(io), UInt32.(collect(size(v))))
    write(io, v)
end

Tried to merge into

function _serialize(io, v::T) where {T<:utype(Real)} 
    write(io, type2byte(v))
    if v isa Real
        println("isa 'single Real'")
        write(io, type2byte(v))
        write(io, UInt8(0))
    else
        println("isa 'Array of Real'")
        write(io, UInt8(ndims(v)))
        write.(Ref(io), UInt32.(collect(size(v))))
    end
    write(io, v)
end

works for

julia> serializej2m(1.0);
isa 'single Real'

but not for

julia> serializej2m([1.0])
processing 'Array of Struct'
ERROR: no components found in type Vector{Float64}

The union in the dispatch apparently confuses the type, branching to

function _serialize(io, v::T) where {T<:AbstractArray}
    println("processing 'Array of Struct'")

If there is no obvious flaw, I will open another thread to learn about type hierarchy beyond the often depicted number types.

DNF · August 9, 2021, 9:15am

Just write:

UInt32.(size(v))

Broadcasting works with tuples too. You should basically never use collect, unless your code cannot work without it.

Here you actually don’t need to broadcast write at all, since it accepts multiple inputs:

write(io, UInt32.(size(v))...)

Still, looking at your latest code, why do you have to branch on scalar vs array? Didn’t my suggestion to use

write(io, prefix(v)...)
write(io, v)

work for both?

Bardo · August 9, 2021, 9:31am

I yet need to give different dimension info:

write(io, UInt8(0))

vs.

write(io, UInt8(ndims(v)))
write(io, UInt32.(size(v))...)

and not all objects allow ndims(v) to test inside a function.

DNF · August 9, 2021, 9:43am

Sure, but the dimensions function (called by prefix in my example) should handle that, no?

It seems to me that the logic is in the wrong place. Everything related to creating metainformation, like the type tag and dimensions, should be done inside a function like prefix (or maybe call it metainfo or something), and then _serialize just prints the metainfo and then the data? It seems cumbersome to put the branching logic inside the printing function like that.

Bardo · August 9, 2021, 10:13am

I am with you to put such information into a function, still the work has to be done somewhere.

prefix(v) = (type2byte(v), dimensions(v)...)

I just did not know how to write a function like ndims(v) which works for every object.
Only recently I learned applicable(), so why not something like

nd(t) = applicable(ndims, t) ? ndims(t) : 0

No idea if there is a speed penalty though.

DNF · August 9, 2021, 10:55am

Yes. The point of splitting the work into smaller tasks that are handled separately is that it becomes much simpler, and you avoid a lot of comparisons/tests and nested branching that tend to happen if you collect the logic in single place.

I suggested an implementation for dimensions previously returned the number of dimensions and the length of each:

dimensions(x) = (UInt8(2), UInt32(1), UInt32(length(x)))  # works for numbers, chars and strings
dimensions(x::AbstractArray) = (UInt8(max(2, ndims(x))), UInt32.(size(x))...)

The implementation may not be correct anymore, since it seems like you now allow 0-dimensionality.

Bardo · August 9, 2021, 11:08am

Right, now I encode the Julia dimensions and, if >0, the size.
Nice to see UInt32 avoids the need to join tuples.

But length() neither works with every object, for example a single struct.
The dimensions function then likely needs some branching using applicable(), or?

DNF · August 9, 2021, 11:25am

I may have missed something up-thread, but I don’t fully know how general you want this serialization function to be. The dimensions function I suggested is divided in two methods: one for AbstracArrays and one for the rest, which I assumed would be basic Numbers, Chars and Strings. I guess you want something more general?

Is everything either AbstractArray or scalar, or could there be other container types? Do you have a rule/list for what sort of data structures you want to cover? Depending on the scope, it might be possible to solve everything with dispatch. But perhaps not.

Bardo · August 9, 2021, 12:40pm

Thanks indeed for your patience!

Here is the list with the working dispatch, but without factoring out the prefix part:

single number:   <:Real                     => UInt8(0)
array of number: <:AbstractArray{<:Real}    => UInt8(ndims(v)), UInt32(size(v))
single char:     ::Char                     => UInt8(0)
array of char:   <:AbstractArray{Char}      => UInt8(ndims(v)), UInt32(size(v))
single string:   ::String                   => UInt8(1), UInt32(length(v))
array of string: <:AbstractArray{String}}   => UInt8(ndims(v)), UInt32(size(v))
single tuple:    ::Tuple                    => UInt8(1), UInt32(length(v))
array of tuple   <:AbstractArray{Tuple}     => UInt8(ndims(v)), UInt32(size(v))  
single struct    <:Any                      => UInt8(0) 
array of struct  <:AbstractArray            => UInt8(ndims(v)), UInt32(size(v))

DNF · August 9, 2021, 1:30pm

I think this reduces to

dimensions(x) = UInt8(0)
dimensions(x::Union{String, Tuple}) = (UInt8(1), UInt32(length(x)))
dimensions(x::AbstractArray) = (UInt8(ndims(x)), UInt32.(size(x))...)

Bardo · August 9, 2021, 1:41pm

[quote=“DNF, post:28, topic:65981”]

You've been faster ;-) - thanks a LOT!

Bardo · August 10, 2021, 2:55pm

Related subject continued in a new thread.

Bardo · August 11, 2021, 2:59pm

Added above my final version. Thx again for your help!

Bardo · August 22, 2021, 9:56pm

Still valid, but run into problems when trying to write it.
write lacks a method for tuples, adding one like

import Base.write
function write(io, T::Tuple)
    for i = 1:length(T)
        write(io, T[i])
    end
end

creates an error

ERROR: LoadError: MethodError: write(::IOStream, ::Tuple{UInt8, UInt32, UInt32}) is ambiguous. Candidates:
  write(io, T::Tuple) in Main at c:\Users\bardo\MATLAB Drive\serialize_10.jl:16
  write(io::IO, x) in Base at io.jl:635
  write(io::IO, x1, xs...) in Base at io.jl:636
Possible fix, define
  write(::IO, ::Tuple)

DNF · August 22, 2021, 10:25pm

The error message says that it cannot decide which method to choose (it’s ambiguous). write(io, T::Tuple) matches the second argument, and write(io::IO, x) matches the first argument. So which method should it choose for write(io::IO, x::Tuple)?

So it suggests that you should define a method definition for write(io::IO, x::Tuple). So add ::IO to the method you defined.

But I suggest this definition instead of the loop:

Base.write(io::IO, x::Tuple) = write(io::IO, x...)

Bardo · August 23, 2021, 5:24am

Great help, great forum! Thx.

Topic		Replies	Views
(Not) Giving up on dispatch General Usage multidispatch	23	1907	August 30, 2021
Union splitting and AbstractArray General Usage dispatch	9	471	August 19, 2021
Type Sudoku (test if object is a struct) Performance type , dispatch	21	1456	June 8, 2022
Know whether a value is a "core" Julia type General Usage serialization	10	1031	August 22, 2021
(De-)Serialize N-dimensional arrays in julia New to Julia question , package , serialization	28	2198	June 4, 2021

How to dispatch this serializer function?

Related topics