Know whether a value is a "core" Julia type

In the context of serialisation, I’d like to know if a given variable has type part of Julia’s “core types” (i.e. does not depend on a user-defined type or an external lib). The context is that I’d like to have a different behaviour in code based on whether some user provided value is serialisable + deserialisable “as is” or not.

using Dates
a = 5
b = "foo"
c = today()
is_std_type(a) # true
is_std_type(b) # true
is_std_type(c) # true

struct Foo
  a::Int
end
f = Foo(1)
is_std_type(f) # false

using OrderedCollections
ld = LittleDict(:a =>5, :b=>7)
is_std_type(ld) # false

(PS: I do not want to use JLDx, JSONx for this for separate reasons)

Edit

  1. parentmodule almost gets me to where I want, for any value I can check whether parentmodule(typeof(x)) === Core but that leaves out stdlib libraries like Dates etc.
  2. using pathof I could check whether a module is stdlib I guess, this and (1) would solve my question but it’s a bit ugly:
function is_std_type(x)
   mdl = parentmodule(typeof(x))
   mdl === Core && return true
   p = pathof(mdl)
   isnothing(p) && return false
   return "stdlib" in splitpath(p)
end
# + probably a bunch of exceptions I don't want like
is_std_type(::Function) = false
is_std_type(::Module) = false
1 Like

You need to check the types of type-parameters and fields also.

consider:


julia> struct Foo end

julia> is_std_type(Foo())
false

julia> is_std_type([Foo(), Foo()])
true
2 Likes

I ended up with

(Edit: added a few cases to account for comments below)

is_easily_serializable(x) = is_easily_serializable(typeof(x))
function is_easily_serializable(T::DataType)
    T === Any && return false
    m = parentmodule(T)
    m in (Base, Core) && return true
    p = pathof(m)
    return p !== nothing && "stdlib" in splitpath(p)
end

is_easily_serializable(x::Union{Tuple, NamedTuple}) =
    all(is_easily_serializable, v for v in x)

is_easily_serializable(x::AA) where AA <: AbstractArray{T} where T =
    all(is_easily_serializable, (T, AA))
is_easily_serializable(x::AR) where AR <: AbstractRange{T} where T =
    all(is_easily_serializable, (T, AR))
is_easily_serializable(x::AD) where AD <: AbstractDict{K, V} where {K, V} =
    all(is_easily_serializable, (K, V, AD))

# For composites with Any type, we need to go over each entry
is_easily_serializable(x::AA) where AA <: AbstractArray{Any} =
    all(is_easily_serializable, (AA, x...))
is_easily_serializable(x::AD) where AD <: AbstractDict{K, Any} where {K} =
    all(is_easily_serializable, (K, AD, values(x)...))

is_easily_serializable(::Function) = false
is_easily_serializable(::Module)   = false
is_easily_serializable(::Ref)      = false
is_easily_serializable(::Ptr)      = false

not quite as nice as I would have hoped but it does the job for now. Suggestions to improve this welcome

PS: /s the name of the function is maybe not ideal, it should be is_easily_serializable_and_deserializable_afterwards_in_a_fresh_julia_session, yeah…

Is the Serialization stdlib not an option for your use case?

The problem with serialization is that there’s no fundamental difference between user defined types and “Core” types. Under the hood, they’re using the same machinery. You’ll run into trouble with pointers (Ptr is a type in julia as well) both in some “core” types as well as in user code, so be aware of that.

1 Like

I am indeed using the Serialization stdlib, the context is basically a serialisation of a “user notebook” where there might be all sorts of types used but if the user only has “easy” types (which is quite likely in my context) then the serialisation - and deserialisation is super easy (meaning that when you deserialise you need zero additional code to recover the exact object).

Otherwise of course I can serialise everything with Serialisation.serialize the question is how much effort is it to recover the actual value in an independent Julia session; in my context it’s quite painful to keep track of user types or the “requires” trick of JLD but it’s reasonable to just fail the serialization for “hard” notebooks which are less likely. (this serialization is used as temporary caching in a next version of Franklin; if it fails the page will just take a bit more time to load which is fine).

PS: and with your last comment, I should probably add that Ref types and Pointer types should be considered non-easily-serialisable.

Depending on what the intent really was, this may not do exactly what you’d expect:

struct S end

is_easily_serializable([1, S()])  # true
1 Like

Ah yes nice, that’s because it’s a Vector of Any type; I guess in that case, like in the tuple case, I have to go over each element, thanks!

isbits might be helpful here. Also looking into serialization, I found it first in an error message when trying to write an array of strings. While not stated explicitly in the doc, I take it as a test that this object can be passed to write(),

Not fully explored, but seems to apply to same-size elements, irrespective of their layout, like array of tuple of Float64. Anyone for details?

1 Like

While all types in base that are isbitstype have a write defined, I think that’s just a fallback for an abstract type being hit. It’s trivial to construct a type that isbits, but doesn’t have write:

julia> struct Isbits                                              
          a::Int                                                  
       end                                                        
                                                                  
julia> isbitstype(Isbits)                                         
true                                                              
                                                                  
julia> write(stdout, Isbits(5))                                   
ERROR: MethodError: no method matching write(::Base.TTY, ::Isbits)

Conversely, there are of course also non-isbits types that have a write defined:

julia> filter(isbitstype, getSubtypes()) .|> (x -> hasmethod(write, (typeof(stdout), x))) |> all 
true                                                                                             
                                                                                                 
julia> filter(!isbitstype, getSubtypes()) .|> (x -> hasmethod(write, (typeof(stdout), x))) |> any
true                                                                                             
Definition of `getSubtypes()`
function getSubtypes(T=Any)::Vector{DataType}                 
    subs = subtypes(T)                                        
    ret = filter(isconcretetype, subs)                        
    filter!(isabstracttype, subs)                             
                                                              
    while !isempty(subs)                                      
        ntype = popfirst!(subs)                               
        ntype == Any && continue                              
        nsubs = subtypes(ntype)                               
        append!(ret, Iterators.filter(isconcretetype, nsubs)) 
        append!(subs, Iterators.filter(isabstracttype, nsubs))
    end                                                       
                                                              
    ret                                                       
end                                                           

Right, I also saw a “wider hitting”. In my serializer code below, I do not yet use it.

The task to write as much as possible in one go is limited here to the next (element) level.
A struct is deserialized into named tuples, are there ways to get it back to its original type?

My vehicle to experiment with the Julia type and dispatch universe:

(de-)serializer
"""
io = serialize(v) reinterprets a Julia object into a series of bytes.
v = deserialize(io) recreates the data from a byte stream

An exercise in dispatch style.

Based on
https://de.mathworks.com/matlabcentral/fileexchange/29457-serialize-deserialize
and "julianized" with the experts on
https://discourse.julialang.org/

"""

# Type encoding
tcode = [
0   Float64; 
1   Float32; 
2   Float16;
3   Bool;
4   Int8;
5   UInt8;
6   Int16;
7   UInt16;
8   Int32;
9   UInt32;
10  Int64;
11  UInt64
12  Char;
13  String;
100 Tuple;
200 Any
]
STRUCT = 255
WRITABLE = 0:12

tcode2type = Dict(tcode[:,1] .=> tcode[:,2])
type2tcode = Dict(tcode[:,2] .=> tcode[:,1])

function serialize(io, v)
    te = eltype(v)
    if typeof(v) <: Tuple
        write(io, UInt8(type2tcode[Tuple]))
        write(io, UInt8(1))
        write(io, UInt32(length(v)))
        serialize.(Ref(io), v)
    elseif typeof(v) == String
            write(io, UInt8(type2tcode[String]))
            write(io, UInt8(1))
            write(io, UInt32(length(v)))
            write(io, v)
    elseif eltype(v) <: Real || eltype(v) ==  Char
        write(io, UInt8(type2tcode[te]))
        nd = ndims(v)
        write(io, UInt8(nd))
        if nd > 0
            write(io, UInt32.(size(v))...)
        end
        write(io, v) 
    elseif v isa  AbstractArray
        if te == Any || te <: Tuple || te == String
            write(io, UInt8(type2tcode[Any])); 
            write(io, UInt8(ndims(v)))
            write(io, UInt32.(size(v))...)
            serialize.(Ref(io), v)
        elseif applicable(fieldcount, te) && fieldcount(te) > 0
            println(v)
            fc = fieldcount(eltype(v))
            write(io, UInt32(fc))
            for name in fieldnames(typeof(first(v)))
                sname = String(name)
                len = ncodeunits(sname)
                write(io, UInt8(len))
                writestr(io, sname)
                serialize(io, getfield.(v, name))
            end
        else
            error("no match for te=$te")
        end
    elseif applicable(fieldcount, te) && fieldcount(te) > 0
        write(io, UInt8(STRUCT))
        write(io, UInt8(0))
        fc = fieldcount(typeof(v))
        write(io, UInt32(fc))
        for name in fieldnames(typeof(v))
            sname = String(name)
            len = ncodeunits(sname)
            writen(io, UInt8(len))
            writestr(io, sname)
            serialize(io, getfield(v, name))
        end
    else
        error("expected struct, but did not find any field")
    end
end

function deserialize(io) 
    ity = Int(readnum(io, UInt8))
    ndms = Int(readnum(io, UInt8))
    dms = ndms == 0 ? 1 : Int.(readnum(io, UInt32, ndms))
    if ity in WRITABLE
        cls = tcode2type[ity]
        return ndms == 0 ? readnum(io, cls) : reshape(readnum(io, cls, prod(dms)), dms...)
    elseif ity == STRUCT
        fname = Symbol[]
        nfld = readnum(io, UInt32)
        fdata = []
        for i = 1:nfld
            fn = readstr(io, readnum(io, UInt8))
            push!(fname, Symbol(fn))
            push!(fdata, deserialize(io))
        end
        if ndms == 0
            return NamedTuple(zip.(Ref(fname), zip(fdata...)))
        else
            return reshape(NamedTuple.(zip.(Ref(fname), zip(fdata...))), dms...)
        end
    elseif ity == type2tcode[String]
        if ndms == 1
            return String(read(io, dms[1]))
        else
            se = String[]
            for i = 1:prod(dms)
                push!(se, deserialize(io))
            end
            return reshape(se, dms...)
        end
    elseif ity == type2tcode[Any] || ity == type2tcode[Tuple]
        istuple = ity == type2tcode[Tuple]
        ele = []
        for i = 1:prod(dms)
            push!(ele, deserialize(io))
        end
        if istuple
            return Tuple(ele)
        else
            return reshape(ele, dms...)
        end
    else
        error("unknown type index $ity")
    end
end

############## IO read/write routines ##############
# write number as type T
function writenum(io, num, T)
    write(io, T(num))
end
# read n numbers of type T
function readnum(io, T, n)
    s = sizeof(T)
    f = zeros(UInt8, s*n)
    readbytes!(io, f, s*n)
    return reinterpret(T, f)
end
# read single number of type T
function readnum(io, T)
    s = sizeof(T)
    f = zeros(UInt8, s)
    readbytes!(io, f, s)
    return reinterpret(T, f)[1]
end
# write string
function writestr(io, str)
    write(io, codeunits(str))
end
# read string of length n
function readstr(io, n)
    return String(read(io, n))
end

############## Tests ##############

function round_trip(data)
    println(data)
    open("io.bin", "w") do io
        serialize(io, data)
    end

    println("..deserialize..")

    io = open("io.bin", "r")
    data2 = deserialize(io)
    close(io)
    data2
end

mutable struct Coords
    x::Float64
    y::Float64
    z::Float64
end
Coords() = Coords(rand(), rand(), rand())

Array_of_Int = [1, 2]
Array_of_Tuple = [(1, 2), (2, 3)]
Array_of_Any = ["Ab", (1, 2)]
Single_Num = pi
Array_of_Num = randn(3,3)
Single_Struct = Coords()
Array_of_Struct = [Coords() for i in 1:5]
Single_Tuple = ("Ab", [pi, 2.0])
Single_String = "toto"
Array_of_String = ["Ab" "toto"; "titi" "ok"]
Array_of_Char = ['a' 'b'; 'c' 'd']

round_trip(Array_of_Any)

You have waay too many explicit type checks and custom functions for writing. Use write(io, num, T::Number) instead of writenum(io, num, T). write(io, str::String) instead of writestr(io, str). Use dispatch to your advantage, not to build explicit ifelse trees.

In reading, having a table converting tags to types is fine, but again - use dispatch to your advantage. read(io, T::Number) instead of readnum(io, T). Call it via tag = read(io, UInt8); read(io, tag2type[tag] instead of manually doing ifelses.

I think I’ve read your thread on that topic and I seem to remember that the code you’ve posted here was not one of the recommended ways of doing what you’re thinking of. Probably best for another topic though, would get OT here.