How to dispatch this serializer function?

Yeah that’s not an issue. An array is just a struct.

I haven’t followed discussions later in this thread, so sorry if I repeat something.

Returning to this snippet:

    # Just serialize each member.
    for ii = 1:numel(v)
        write(io, serializej2m(v[ii]))
    end

The problem of these lines is that type v[ii] can be unknown at the time of compilation, if for example v is defined as v = Any[1, 2, 3, 4].In this case, all dispatching will happen in runtime and this is bad, cause it’s relatively slow. The whole thing about type-stability is that if the compiler can calculate types during compile time, then it can write highly efficient code. If not then things can get ugly.

Union splitting is one way to overcome this problem, if you know possible set of types beforehand, you can write

...
if v[ii] isa UInt8
  write(io, serializej2m(v[ii]))
elseif v[ii] isa UInt16
  write(io, serializej2m(v[ii]))
...

What is going on here, is that during runtime, instead of running full runtime lookup of type and corresponding function, it will just make a pointer comparison (which is fast) and execute the corresponding branch.

Now, coincidently this code looks similar to the classToByte function in your original definition, but the thing is, it is used in different circumstances. If all you are trying to do is to dispatch (i.e. choose function) depending on the type of incoming object, then you should use multiple dispatch, because compiler can do it better then you. If you are in a situation when there is no way to avoid runtime dispatch, it may be useful to use union splitting.

With all that said, maybe it is not applicable in this situation, since I do not quite understand in which situation should serializej2m(io, v::AbstractArray) be used, so maybe union splitting is not applicable here.

1 Like

Your info comes handy to handle the single/array dispatch:

# Encode number types
utype(t) = Union{t, AbstractArray{t}}
type2byte(::T) where {T<:utype(Float64)}            = UInt8(0)
type2byte(::T) where {T<:Union{Any, AbstractArray}} = UInt8(255)  # fallback for struct

function _serialize_old(io, v::T) where {T<:Real} 
    println("processing 'single Real'")
    write(io, type2byte(v))
    write(io, UInt8(0))
    write(io, v)
end
function _serialize_old(io, v::T) where {T<:AbstractArray{<:Real}}
    println("processing 'Array of Real'")
    write(io, type2byte(v))
    write(io, UInt8(ndims(v)))
    write.(Ref(io), UInt32.(collect(size(v))))
    write(io, v)
end

Tried to merge into

function _serialize(io, v::T) where {T<:utype(Real)} 
    write(io, type2byte(v))
    if v isa Real
        println("isa 'single Real'")
        write(io, type2byte(v))
        write(io, UInt8(0))
    else
        println("isa 'Array of Real'")
        write(io, UInt8(ndims(v)))
        write.(Ref(io), UInt32.(collect(size(v))))
    end
    write(io, v)
end

works for

julia> serializej2m(1.0);
isa 'single Real'

but not for

julia> serializej2m([1.0])
processing 'Array of Struct'
ERROR: no components found in type Vector{Float64}

The union in the dispatch apparently confuses the type, branching to

function _serialize(io, v::T) where {T<:AbstractArray}
    println("processing 'Array of Struct'")

If there is no obvious flaw, I will open another thread to learn about type hierarchy beyond the often depicted number types.

Just write:

UInt32.(size(v))

Broadcasting works with tuples too. You should basically never use collect, unless your code cannot work without it.

Here you actually don’t need to broadcast write at all, since it accepts multiple inputs:

write(io, UInt32.(size(v))...)

Still, looking at your latest code, why do you have to branch on scalar vs array? Didn’t my suggestion to use

write(io, prefix(v)...)
write(io, v)

work for both?

I yet need to give different dimension info:

write(io, UInt8(0))

vs.

write(io, UInt8(ndims(v)))
write(io, UInt32.(size(v))...)

and not all objects allow ndims(v) to test inside a function.

Sure, but the dimensions function (called by prefix in my example) should handle that, no?

It seems to me that the logic is in the wrong place. Everything related to creating metainformation, like the type tag and dimensions, should be done inside a function like prefix (or maybe call it metainfo or something), and then _serialize just prints the metainfo and then the data? It seems cumbersome to put the branching logic inside the printing function like that.

I am with you to put such information into a function, still the work has to be done somewhere.

prefix(v) = (type2byte(v), dimensions(v)...)

I just did not know how to write a function like ndims(v) which works for every object.
Only recently I learned applicable(), so why not something like

nd(t) = applicable(ndims, t) ? ndims(t) : 0

No idea if there is a speed penalty though.

Yes. The point of splitting the work into smaller tasks that are handled separately is that it becomes much simpler, and you avoid a lot of comparisons/tests and nested branching that tend to happen if you collect the logic in single place.

I suggested an implementation for dimensions previously returned the number of dimensions and the length of each:

dimensions(x) = (UInt8(2), UInt32(1), UInt32(length(x)))  # works for numbers, chars and strings
dimensions(x::AbstractArray) = (UInt8(max(2, ndims(x))), UInt32.(size(x))...)

The implementation may not be correct anymore, since it seems like you now allow 0-dimensionality.

Right, now I encode the Julia dimensions and, if >0, the size.
Nice to see UInt32 avoids the need to join tuples.

But length() neither works with every object, for example a single struct.
The dimensions function then likely needs some branching using applicable(), or?

I may have missed something up-thread, but I don’t fully know how general you want this serialization function to be. The dimensions function I suggested is divided in two methods: one for AbstracArrays and one for the rest, which I assumed would be basic Numbers, Chars and Strings. I guess you want something more general?

Is everything either AbstractArray or scalar, or could there be other container types? Do you have a rule/list for what sort of data structures you want to cover? Depending on the scope, it might be possible to solve everything with dispatch. But perhaps not.

Thanks indeed for your patience!

Here is the list with the working dispatch, but without factoring out the prefix part:

single number:   <:Real                     => UInt8(0)
array of number: <:AbstractArray{<:Real}    => UInt8(ndims(v)), UInt32(size(v))
single char:     ::Char                     => UInt8(0)
array of char:   <:AbstractArray{Char}      => UInt8(ndims(v)), UInt32(size(v))
single string:   ::String                   => UInt8(1), UInt32(length(v))
array of string: <:AbstractArray{String}}   => UInt8(ndims(v)), UInt32(size(v))
single tuple:    ::Tuple                    => UInt8(1), UInt32(length(v))
array of tuple   <:AbstractArray{Tuple}     => UInt8(ndims(v)), UInt32(size(v))  
single struct    <:Any                      => UInt8(0) 
array of struct  <:AbstractArray            => UInt8(ndims(v)), UInt32(size(v))

I think this reduces to

dimensions(x) = UInt8(0)
dimensions(x::Union{String, Tuple}) = (UInt8(1), UInt32(length(x)))
dimensions(x::AbstractArray) = (UInt8(ndims(x)), UInt32.(size(x))...)

[quote=“DNF, post:28, topic:65981”]

You've been faster ;-) - thanks a LOT!

Related subject continued in a new thread.

Added above my final version. Thx again for your help!

Still valid, but run into problems when trying to write it.
write lacks a method for tuples, adding one like

import Base.write
function write(io, T::Tuple)
    for i = 1:length(T)
        write(io, T[i])
    end
end

creates an error

ERROR: LoadError: MethodError: write(::IOStream, ::Tuple{UInt8, UInt32, UInt32}) is ambiguous. Candidates:
  write(io, T::Tuple) in Main at c:\Users\bardo\MATLAB Drive\serialize_10.jl:16
  write(io::IO, x) in Base at io.jl:635
  write(io::IO, x1, xs...) in Base at io.jl:636
Possible fix, define
  write(::IO, ::Tuple)

The error message says that it cannot decide which method to choose (it’s ambiguous). write(io, T::Tuple) matches the second argument, and write(io::IO, x) matches the first argument. So which method should it choose for write(io::IO, x::Tuple)?

So it suggests that you should define a method definition for write(io::IO, x::Tuple). So add ::IO to the method you defined.

But I suggest this definition instead of the loop:

Base.write(io::IO, x::Tuple) = write(io::IO, x...)
2 Likes

Great help, great forum! Thx.