Default constructor for any type?

MilesCranmer · February 21, 2024, 9:28pm

I am trying to create generic defaults in an immutable struct for a parametric type.

Say that I am trying to create a linked list, like this:

struct List{T}
    value::T
    next::RefValue{List{T}}

    List(value::_T) where {_T} = new{_T}(value, Ref{List{_T}}())
    List(next::List{_T}) where {_T} = new{_T}(zero(_T), Ref(next))
end

As you can see, I can always create next because I can just use the default constructor Ref{List{T}}() which works for any T!

But what about T itself? As you can see in the above example, I’m just using zero, because that is defined for numerical types and vectors which is what I mostly care about. But what about general T? I don’t actually want to access .value if it is uninitialized, but I don’t want to have to make this a mutable struct either (giving me undef), because then I can’t statically compile it!

Also, you can’t statically compile Union, so I couldn’t get away with Union{T,Nothing} either.

Basically, how can I create a default or otherwise uninitialized singleton for arbitrary T when zero is not defined?

jling · February 21, 2024, 9:47pm

in this specific case I feel like this is just wrong, seems to me it should be:

List(value::_T, next::List{_T}) where {_T} = new{_T}(value,, Ref(next))

Sukera · February 21, 2024, 9:53pm

What you’re trying to do doesn’t really make sense to me - your List(::List) constructor prepends an element(?) but doesn’t initialize that?

In general, there is no safe “default” value for arbitrary T that you can use - you’ll need an already existing object to prepend.

CameronBieganek · February 21, 2024, 9:56pm

I think the way to do this is to use incomplete initialization, something like this:

struct List{T}
    value::T
    next::List{T}

    List(value::T) where T = new{T}(value)
    List(value::T, list::List{T}) where T = new{T}(value, list)
end

julia> l1 = List(1)
List{Int64}(1, #undef)

julia> l2 = List(2, l1)
List{Int64}(2, List{Int64}(1, #undef))

Sukera · February 21, 2024, 9:59pm

No, that requires checking whether next is defined. There’s a better way for doing that, without initialization issues:

struct List{T}
    value::T
    next::Base.RefValue{List{T}}

    List(value::T, list::List{T}) where T = new{T}(value, list)
    function List(value::T) where T
          list = new{T}(value, Ref{List{T}}())
          list.next[] = list
          list
    end
end

When looking for the end of the list, you can check whether list === list.next[]. If they’re the same object, you’re at the end of the list. There’s never an uninitialized field, so you’re preserving type safety and always get a valid List object.

MilesCranmer · February 21, 2024, 10:03pm

Sorry I wasn’t expecting people to discuss the type itself, I just wanted the List to act as a simpler discussion point.

The actual type is Node{T} in DynamicExpressions.jl:

github.com

SymbolicML/DynamicExpressions.jl/blob/4bd83e97d23ebb3fbde57f720b0265c5ecc33881/src/Equation.jl#L98-L112


      
          mutable struct Node{T} <: AbstractExpressionNode{T}
              degree::UInt8  # 0 for constant/variable, 1 for cos/sin, 2 for +/* etc.
              constant::Bool  # false if variable
              val::T  # If is a constant, this stores the actual value
              # ------------------- (possibly undefined below)
              feature::UInt16  # If is a variable (e.g., x in cos(x)), this stores the feature index.
              op::UInt8  # If operator, this is the index of the operator in operators.binops, or operators.unaops
              l::Node{T}  # Left child node. Only defined for degree=1 or degree=2.
              r::Node{T}  # Right child node. Only defined for degree=2. 
          
              #################
              ## Constructors:
              #################
              Node{_T}() where {_T} = new{_T}()
          end

Which as you can see is a sum-type of four node types: a constant (with value ::T), a feature (with index ::UInt16), a 1-ary operator, and a 2-ary operator.

Right now it’s a mutable struct but I want to make it immutable (with Ref wrapping the children). However for this I will need to have a way of constructing the arbitrary val::T.

@jling I could take the suggestion of assuming the user will pass an example value. But maybe there’s a simpler way?

Sukera · February 21, 2024, 10:06pm

You cannot construct arbitrary T without already knowing what exactly that T is and how it should be constructed. For example, I can create a type that has no constructors:

julia> struct Foo
           a::Int
           bar() = 1
       end

julia> Foo
Foo

julia> Foo |> methods
# 0 methods for type constructor

So the best option is to have the user pass in the value and let them worry about creating instances of it.

MilesCranmer · February 21, 2024, 10:08pm

But I’m not accessing it; it’s just occupying space in the struct. Presumably if it is isbitstype I can somehow initialize enough memory and do a reinterpret somehow? But is there really no cleaner way?

Sukera · February 21, 2024, 10:12pm

Having the user pass it in is the cleanest way. The Foo struct is just an example; I could have just as well made it like this:

julia> mutable struct Foo
           a::Vector{Int}
           bar() = 1
       end

julia> Foo |> methods
# 0 methods for type constructor

which still has no way to be constructed (and certainly can’t be reinterpreted due to the inner pointer).

Even for isbitstypes, you can’t rely on reinterpret to do the work for you (without already knowing what exactly T is and how it is constructed). For example, I can create a new primitive type Baz 8 end where I have defined a secret/black box constructor only available in a shared C library (serving as a blackbox implementation). You cannot get a valid instance of that type because you cannot know the invariants its bitpattern must observe when constructed (that’s the secret part) - you need to go through that ccall, and reinterpret doesn’t do that.

MilesCranmer · February 21, 2024, 10:21pm

Another issue here (besides practicality of copying around an example val to all my constructors) is that I would need to copy the val::T for every single node I create (lest having all branch nodes point to the same val which would make distributed c). This could end up being a huge allocation.

Is there no workaround here? I could just have this field be Ref{T}(), but that seems like an ugly workaround.

Is there no way I can just tell Julia to ignore that field in a struct?

For example, I can do:

julia> f(::Type{T}) where {T} = Array{T}(undef, 1);

julia> f(Vector{Int})
1-element Vector{Vector{Int64}}:
 #undef

Why can’t I use this #undef in a default constructor for the val field too?

MilesCranmer · February 21, 2024, 10:27pm

I suppose one solution here is to just do

  struct Node{T} <: AbstractExpressionNode{T}
      degree::UInt8  # 0 for constant/variable, 1 for cos/sin, 2 for +/* etc.
      constant::Bool  # false if variable
-     val::T  # If is a constant, this stores the actual value
      feature::UInt16  # If is a variable (e.g., x in cos(x)), this stores the feature index.
      op::UInt8  # If operator, this is the index of the operator in operators.binops, or operators.unaops
      l::RefValue{Node{T}}  # Left child node. Only defined for degree=1 or degree=2.
      r::RefValue{Node{T}}  # Right child node. Only defined for degree=2. 
+     val::T  # If is a constant, this stores the actual value
  end

and then I can create nodes without the val by simply only passing the first 6 arguments. But this seems pretty arbitrary…

Sukera · February 21, 2024, 10:27pm

You can, if that val field is at the end of the struct, using the incomplete initialization that @CameronBieganek mentioned above. It just won’t be useful to you, since there is no actual value stored there; #undef means exactly that there isn’t an instance; there is no valid object there (even though the field claims to have type T, #undef is not a T!).

If you want incomplete initialization, then you’ll have to deal with the consequences of #undef Note that a Ref doesn’t save you from that; if a user ever gets the idea to access the Ref before actually storing a value, they’ll get the same error as if they had accessed an uninitialized field:

julia> Ref{Vector{Int}}()[]
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getproperty
   @ ./Base.jl:49 [inlined]
 [2] getindex(b::Base.RefValue{Vector{Int64}})
   @ Base ./refvalue.jl:59
 [3] top-level scope
   @ REPL[1]:1

julia> struct Foo
           a::Vector{Int}
           Foo() = new()
       end

julia> Foo().a
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getproperty(x::Foo, f::Symbol)
   @ Base ./Base.jl:49
 [2] top-level scope
   @ REPL[3]:1

The reason for this is that just having the right amount of memory allocated is not the same as having an actual object stored in that memory. If you want that field to always contain a valid object, that field needs to be set before the constructor returns, either through mutation or by passing a valid object to new.

MilesCranmer · February 21, 2024, 10:29pm

Our messages crossed and turns out we said the same thing!

I don’t mind that #undef is not a T, so long as it is type stable and allows me to statically compile. Which I presume is the case here?

Sukera · February 21, 2024, 10:33pm

I mean, nothing mentioned so far prevents static compilation in principle, so I’m not exactly sure what you’re referring to here. You can statically compile mutable structs just fine, I’m guessing you’re more worried about a potential heap allocation? That’ll likely happen either way if T is not a bitstype.

You’ve mentioned at the start that you don’t want to access .value if it’s not initialized - but that’s exactly what #undef is. It’s uninitialized memory, so if you ever forget to check before accessing that field and it’s not initialized, you’ll get that UndefRefError. If you want to be type safe there, you’ll want to make sure that this is always a valid instance of a T, and that means having the user worry about constructing it and let them pass it in (or use a Ref and punt the #undef there, talking sternly to your users that it’s their responsibility to fill it).

raminammour · February 21, 2024, 10:45pm

Can you explain that statement please? I am under the impression that this pattern is safe, it is used for the iteration protocol after all.

MilesCranmer · February 21, 2024, 10:50pm

By definition Union{T,Nothing} is type unstable which apparently is not good for static compilation as it requires type inference: GitHub - brenhinkeller/StaticTools.jl: Enabling StaticCompiler.jl-based compilation of (some) Julia code to standalone native binaries by avoiding GC allocations and llvmcall-ing all the things!

However I guess sometimes Julia gets around it? Might be related to union splitting: Union-splitting: what it is, and why you should care

MilesCranmer · February 21, 2024, 10:56pm

Thanks. I should specify I know I will never access .val if it’s not initialized. This is because Node is a sum type and you can infer .val is not initialized by reading the value of .degree and .constant (which are initialized).

CameronBieganek · February 21, 2024, 10:58pm

I don’t have a very good grasp of your requirements. Could you phrase it in terms of a MWE involving a type like the following?

struct Node
    val
    l
    r
end

(…And add whatever type annotations you think are appropriate.)

Sukera · February 21, 2024, 11:06pm

My point is that Node (to Julia!) is not a sum type. It’s a regular product type, as all structs are, and the language does not help you in any way to preserve that invariant coupling between fields. If there’s ever a path through your code that accesses .val without properly checking the other fields first (or, by some third party just accessing the field with the wrong assumptions), you will sooner or later hit that UndefRefError. The safest option is then to make sure it’s always valid, by either requiring it to be passed in or accepting the type instability and representing the “uninitialized” state with a different type from the “initialized” state.

The general idea for this sort of transition (from ad-hoc, loosely enforced couplings to type-safe abstractions) is called “Making illegal states unrepresentable”. In Julia, at the moment that unfortunately means type instability, because we don’t really have the tools to properly enforce these invariants.

Palli · February 21, 2024, 11:32pm

Exactly what I was thinking, but even for numeric types, if restricting to those:

I wouldn’t rely on that (rather one?) for all types, e.g. Rust has NonZeroU8 in std::num - Rust (to make division by zero impossible…), and I’ve been thinking if Julia should have such a type, at least available, if not in Base.

julia> one(String)  # wasn't sure what to expect, or that zero not defined
""

julia> zero(String)
ERROR: MethodError: no method matching zero(::Type{String})

There’s a good recent video (or at least to me) about “constructors broken” in C++, vs in Rust (i.e. factories better), and I meant to post a link, think more if it applies to Julia too.

It’s usually the wrong structure (while your problem more general to all containers, also likely for below), and famously hard in Rust…:

Topic		Replies	Views
Create constructor for type without a default constructor General Usage	3	524	May 30, 2020
Default value of some fields in a mutable struct General Usage struct	24	17640	October 2, 2022
Parametric Type Constructor General Usage	1	420	September 17, 2017
Optional parameters with default types in parametric structure constructors Internals & Design	12	557	June 4, 2023
Default constructor for parametric types in 0.7 General Usage	2	479	April 15, 2018

Default constructor for any type?

Related topics