Correct usage of type-dependent constructors

Hello,

I need to create a constructor with a parametric type to be able to dispatch. One possible way is the following

abstract type MyType end

# Singletons occupying 0 bytes
struct Type1 <: MyType end
struct Type2 <: MyType end

struct MyStruct{T<:MyType,MT<:AbstractArray}
    data::MT
    type::T
end

and I can call it with x = MyStruct(rand(10), Type1()) for example.

I could even remove the type field from the struct, and just leave the parametric type, but I prefer to be able to call x.type.

Now, do I really need to define the constructors to just have some types to use for multiple dispatch? Is the use of “standard” types better here? I mean something like

abstract type MyType end

# or primitive types?
abstract type Type1 <: MyType end
abstract type Type2 <: MyType end

struct MyStruct{T<:MyType,MT<:AbstractArray}
    data::MT
    type::Type{T} # Note the Type here
end

This would allow to define the object without parentheses, with something like x = MyStruct(rand(10), Type1), which I prefer.

Do you recommend this second method?

Alternative method

Or can I just do a mix of the two?

abstract type MyType end

# Singletons occupying 0 bytes
struct Type1 <: MyType end
struct Type2 <: MyType end

struct MyStruct{T<:MyType,MT<:AbstractArray}
    data::MT
    type::Type{T} # Note the Type here
end

and then calling MyStruct(rand(10), Type1) so that I don’t have to put the parentheses.

Which method do you recommend? Is it a good way to work with types rather than structs?

If this method works, I was trying to implement the following function

function MyStruct(data; type::Union{Nothing, Type{T}} = nothing) where T
    if isnothing(type)
        _type = Type1
    else
        _type = T
    end
           
    return MyStruct(data, _type)
end

But I get type instabilities when doing MyStruct(rand(3); type=Type2) for example.

You could do

abstract type MyType end

struct Type1 <: MyType end
struct Type2 <: MyType end

struct MyStruct{T <: MyType, MT <: AbstractArray}
    data::MT
end
    
MyStruct{T}(data::MT) where {T <: MyType, MT <: AbstractArray} = MyStruct{T,MT}(data)

This way you don’t have type instabilities. Moreover, you don’t have a type::Type{T} field that occupies memory (8 bytes). Whether Type1 and Type2 themselves consume memory doesn’t matter.

You’ve said that you want to say x.type. What’s wrong with

mytype(x::MyStruct{T}) where T = T

If you insist on the x.type notation, you could say

function Base.getproperty(x::MyStruct{T}, name::Symbol) where T <: MyType
    name == :type ? T : getfield(x, name)
end

but I wouldn’t do it.

ADDED: Instead of defining new types like Type1 or Type2, you can also use integers or symbols as type parameters. That often suffices in simple cases.

2 Likes

Hi @matthias314,

Thank you for your reply.

I actually still see some type instabilities when using a function with keyword arguments like

function MyStruct(data; type::Type{T}=Type1) where T<:MyType
    return MyStruct{T}(data)
end

In general it will be more complicated, doing different things depending on the type argument.

I don’t have any type instabilities when using the struct as a variable, like

function MyStruct(data; type::T=Type1()) where T<:MyType
    return MyStruct{T}(data)
end

which is ok, but I would try keep using only types (if possible).

Moreover, I really need to have defined x.type, as I have to align with another package in Python that has this. Do you recommend to put it as a field, or defining a custom Base.getproperty?

1 Like

As far as I know, there is no clean way around this. It’s a deficiency of keyword arguments. However, if you parametrize your type by, say, symbols, then you can say something like

function MyStruct(data; type::Val{S} = Val(:default)) where S
    return MyStruct{S}(data)
end

EDIT: This also works with types instead of symbols.

Is the keyword argument important? Otherwise you could just define

MyStruct(data) = MyStruct{Type1}(data)

Do you recommend to put it as a field

I don’t know. If your parameter types don’t need memory, then a type field would be OK, I guess. Your goal seems to be to imitate some Python code, so it’s hard to tell from the outside. In any case, changing from one approach to the other should only require changing a few lines of code, so you try out both.

2 Likes

abstract or primitive types don’t have any benefit over normal concrete structs here, they all subtype DataType.

The compiler tries its best to maintain Type{T} information, but because typeof(T) is DataType (or maybe Union or UnionAll), it gets lost when it’s used to construct a container like an Array. In keyword arguments’ case, it goes through a NamedTuple. The instance is usually stored for this reason.

It seems like you’re effectively embedding a Holy trait-like instance or type into MyStruct for the purpose of dispatch. I know that this is probably a reduced example, so the following opinions may be unsuitable. Nevertheless, here are some alternatives I find more suitable:

  1. Since the trait isn’t necessary for the data of the structure (0 bytes by intention), it may not even need to be part of the type. Multiple dispatch after all lets you dispatch on MyStruct and the trait as separate arguments or even a 2-tuple. The benefit of not having to dispatch on the trait is type stability when the trait isn’t needed and varies at runtime.
  2. Currently you have concrete MyStruct{T, MT} subtyping MyStruct{<:Any, MT} and concrete T subtyping MyType. You could simplify that into MyType subtypes containing the data directly, combining the trait and the data permanently. If you had a fixed set of MyType subtypes, this would probably be the expected move. If you want to vary the exact data structure, even across MyTypes, you can abstract that away with interface methods.
1 Like

That was a good idea, but I would avoid defining everything with Val(Symbol). So I think that directly defining the struct as an argument is the safest option.

Good to know, thanks.

Could you be more explicit here?

I still prefer to keep them separate, such that I can do x.data and x.type separately.

Calls would look like foo(mystruct, mytype) or foo( (mytype, mystruct) ), the data in MyStruct is entirely independent of the trait MyType.

1 Like

Ok, thanks.

I think I will continue using the struct inside as before.

Thanks anyway to both of you. Everything is more clear now.