Yeah that’s not an issue. An array is just a struct.
I haven’t followed discussions later in this thread, so sorry if I repeat something.
Returning to this snippet:
# Just serialize each member.
for ii = 1:numel(v)
write(io, serializej2m(v[ii]))
end
The problem of these lines is that type v[ii]
can be unknown at the time of compilation, if for example v
is defined as v = Any[1, 2, 3, 4]
.In this case, all dispatching will happen in runtime and this is bad, cause it’s relatively slow. The whole thing about type-stability is that if the compiler can calculate types during compile time, then it can write highly efficient code. If not then things can get ugly.
Union splitting is one way to overcome this problem, if you know possible set of types beforehand, you can write
...
if v[ii] isa UInt8
write(io, serializej2m(v[ii]))
elseif v[ii] isa UInt16
write(io, serializej2m(v[ii]))
...
What is going on here, is that during runtime, instead of running full runtime lookup of type and corresponding function, it will just make a pointer comparison (which is fast) and execute the corresponding branch.
Now, coincidently this code looks similar to the classToByte
function in your original definition, but the thing is, it is used in different circumstances. If all you are trying to do is to dispatch (i.e. choose function) depending on the type of incoming object, then you should use multiple dispatch, because compiler can do it better then you. If you are in a situation when there is no way to avoid runtime dispatch, it may be useful to use union splitting.
With all that said, maybe it is not applicable in this situation, since I do not quite understand in which situation should serializej2m(io, v::AbstractArray)
be used, so maybe union splitting is not applicable here.
Your info comes handy to handle the single/array dispatch:
# Encode number types
utype(t) = Union{t, AbstractArray{t}}
type2byte(::T) where {T<:utype(Float64)} = UInt8(0)
type2byte(::T) where {T<:Union{Any, AbstractArray}} = UInt8(255) # fallback for struct
function _serialize_old(io, v::T) where {T<:Real}
println("processing 'single Real'")
write(io, type2byte(v))
write(io, UInt8(0))
write(io, v)
end
function _serialize_old(io, v::T) where {T<:AbstractArray{<:Real}}
println("processing 'Array of Real'")
write(io, type2byte(v))
write(io, UInt8(ndims(v)))
write.(Ref(io), UInt32.(collect(size(v))))
write(io, v)
end
Tried to merge into
function _serialize(io, v::T) where {T<:utype(Real)}
write(io, type2byte(v))
if v isa Real
println("isa 'single Real'")
write(io, type2byte(v))
write(io, UInt8(0))
else
println("isa 'Array of Real'")
write(io, UInt8(ndims(v)))
write.(Ref(io), UInt32.(collect(size(v))))
end
write(io, v)
end
works for
julia> serializej2m(1.0);
isa 'single Real'
but not for
julia> serializej2m([1.0])
processing 'Array of Struct'
ERROR: no components found in type Vector{Float64}
The union in the dispatch apparently confuses the type, branching to
function _serialize(io, v::T) where {T<:AbstractArray}
println("processing 'Array of Struct'")
If there is no obvious flaw, I will open another thread to learn about type hierarchy beyond the often depicted number types.
Just write:
UInt32.(size(v))
Broadcasting works with tuples too. You should basically never use collect
, unless your code cannot work without it.
Here you actually don’t need to broadcast write
at all, since it accepts multiple inputs:
write(io, UInt32.(size(v))...)
Still, looking at your latest code, why do you have to branch on scalar vs array? Didn’t my suggestion to use
write(io, prefix(v)...)
write(io, v)
work for both?
I yet need to give different dimension info:
write(io, UInt8(0))
vs.
write(io, UInt8(ndims(v)))
write(io, UInt32.(size(v))...)
and not all objects allow ndims(v) to test inside a function.
Sure, but the dimensions
function (called by prefix
in my example) should handle that, no?
It seems to me that the logic is in the wrong place. Everything related to creating metainformation, like the type tag and dimensions, should be done inside a function like prefix
(or maybe call it metainfo
or something), and then _serialize
just prints the metainfo and then the data? It seems cumbersome to put the branching logic inside the printing function like that.
I am with you to put such information into a function, still the work has to be done somewhere.
prefix(v) = (type2byte(v), dimensions(v)...)
I just did not know how to write a function like ndims(v) which works for every object.
Only recently I learned applicable(), so why not something like
nd(t) = applicable(ndims, t) ? ndims(t) : 0
No idea if there is a speed penalty though.
Yes. The point of splitting the work into smaller tasks that are handled separately is that it becomes much simpler, and you avoid a lot of comparisons/tests and nested branching that tend to happen if you collect the logic in single place.
I suggested an implementation for dimensions
previously returned the number of dimensions and the length of each:
dimensions(x) = (UInt8(2), UInt32(1), UInt32(length(x))) # works for numbers, chars and strings
dimensions(x::AbstractArray) = (UInt8(max(2, ndims(x))), UInt32.(size(x))...)
The implementation may not be correct anymore, since it seems like you now allow 0-dimensionality.
Right, now I encode the Julia dimensions and, if >0, the size.
Nice to see UInt32 avoids the need to join tuples.
But length() neither works with every object, for example a single struct.
The dimensions function then likely needs some branching using applicable(), or?
I may have missed something up-thread, but I don’t fully know how general you want this serialization function to be. The dimensions
function I suggested is divided in two methods: one for AbstracArrays and one for the rest, which I assumed would be basic Number
s, Char
s and String
s. I guess you want something more general?
Is everything either AbstractArray
or scalar, or could there be other container types? Do you have a rule/list for what sort of data structures you want to cover? Depending on the scope, it might be possible to solve everything with dispatch. But perhaps not.
Thanks indeed for your patience!
Here is the list with the working dispatch, but without factoring out the prefix part:
single number: <:Real => UInt8(0)
array of number: <:AbstractArray{<:Real} => UInt8(ndims(v)), UInt32(size(v))
single char: ::Char => UInt8(0)
array of char: <:AbstractArray{Char} => UInt8(ndims(v)), UInt32(size(v))
single string: ::String => UInt8(1), UInt32(length(v))
array of string: <:AbstractArray{String}} => UInt8(ndims(v)), UInt32(size(v))
single tuple: ::Tuple => UInt8(1), UInt32(length(v))
array of tuple <:AbstractArray{Tuple} => UInt8(ndims(v)), UInt32(size(v))
single struct <:Any => UInt8(0)
array of struct <:AbstractArray => UInt8(ndims(v)), UInt32(size(v))
I think this reduces to
dimensions(x) = UInt8(0)
dimensions(x::Union{String, Tuple}) = (UInt8(1), UInt32(length(x)))
dimensions(x::AbstractArray) = (UInt8(ndims(x)), UInt32.(size(x))...)
[quote=“DNF, post:28, topic:65981”]
You've been faster ;-) - thanks a LOT!
Related subject continued in a new thread.
Added above my final version. Thx again for your help!
Still valid, but run into problems when trying to write it.
write lacks a method for tuples, adding one like
import Base.write
function write(io, T::Tuple)
for i = 1:length(T)
write(io, T[i])
end
end
creates an error
ERROR: LoadError: MethodError: write(::IOStream, ::Tuple{UInt8, UInt32, UInt32}) is ambiguous. Candidates:
write(io, T::Tuple) in Main at c:\Users\bardo\MATLAB Drive\serialize_10.jl:16
write(io::IO, x) in Base at io.jl:635
write(io::IO, x1, xs...) in Base at io.jl:636
Possible fix, define
write(::IO, ::Tuple)
The error message says that it cannot decide which method to choose (it’s ambiguous). write(io, T::Tuple)
matches the second argument, and write(io::IO, x)
matches the first argument. So which method should it choose for write(io::IO, x::Tuple)
?
So it suggests that you should define a method definition for write(io::IO, x::Tuple)
. So add ::IO
to the method you defined.
But I suggest this definition instead of the loop:
Base.write(io::IO, x::Tuple) = write(io::IO, x...)
Great help, great forum! Thx.