Default constructor for any type?

It’s been suggested to allow new(..., undef, ...) for leaving an arbitrary struct field uninitialized (ideally also deprecating the current way of doing partial initialization, which is a tried and true footgun when iterating on struct layouts). Mentioned in Warn on uninitialized isbits-fields in structs? · Issue #24943 · JuliaLang/julia · GitHub and Semantics of `Expr(:new)` underspecified · Issue #26764 · JuliaLang/julia · GitHub.

1 Like

Incidentally, this exists due to the reasoning which uses * for String concatenation: we want t::T * one(T) == t for all t isa T, for all T.

2 Likes

I’m not sure if this is the right approach, but i would do something like:

function defaultof end

defaultof(::Type{T}) where {T <: Number} = zero(T)
defaultof(::Type{String}) = ""
# implement defaultof for more types

Expose defaultof to be extend by the user for their custom types. Then, define the constructor as:

List(next::List{_T}) where {_T} = new{_T}(defaultof(_T), Ref(next))

As an example, if a user defines a custom type such as:

struct Rect
    x::Float64
    y::Float64
    w::Float64
    h::Float64
end

defaultof should be implemented for that type:

defaultof(::Type{Rect}) = Rect(0, 0, -1, -1)

Why? (Wondering if this is an X-Y problem.)

1 Like

I haven not read through the whole thread yet, but I do want to point out that Ref is an abstract type. You should probably refer to the concrete type Base.RefValue as the type of the field.

struct List{T}
    value::T
    next::Base.RefValue{List{T}}

    List(value::_T) where {_T} = new{_T}(value, Ref{List{_T}}())
    List(next::List{_T}) where {_T} = new{_T}(zero(_T), Ref(next))
end

To address your original question, let’s say we have a type that is impossible to construct directly as follows:

julia> struct Foo
           x::Int
           y::UInt8
           z::Float32
           function Foo end
       end

julia> methods(Foo)
# 0 methods for type constructor

julia> Foo(1, 0x1, 1.f0)
ERROR: MethodError: no method matching Foo(::Int64, ::UInt8, ::Float32)
Stacktrace:
 [1] top-level scope
   @ REPL[3]:1

Since we know the size Foo, we can construct Foo another way.

julia> sizeof(Foo)
16

julia> foo = first(reinterpret(Foo, zeros(UInt8, sizeof(Foo))))
Foo(0, 0x00, 0.0f0)

Note that the above only works for bitstypes.

julia> mutable struct Bar
           x::Int
           y::UInt8
           z::Float32
           function Bar end
       end

julia> bar = first(reinterpret(Bar, zeros(UInt8, sizeof(Foo))))
ERROR: ArgumentError: cannot reinterpret `UInt8` as `Bar`, type `Bar` is not a bits type
...

We can use another approach, but this now requires unsafe_load.

julia> let z = zeros(UInt8, sizeof(Bar))
           GC.@preserve z unsafe_load(Ptr{Bar}(pointer(z)))
       end
Bar(0, 0x00, 0.0f0)

What you’ve written has already been discussed further up.

As the name says, this is unsafe, here not because of memory safety (everything is properly GC tracked) but because you’re breaking any potential invariants of Foo. From the POV of Julia, there is no constructor, so there is no valid way to construct an instance of type Foo. It’s the same as the primitive type example from above.

1 Like

Are you sure that you can’t have a small union field?

From this write up, it looks like static compilation can handle some localized instability. You might have to type annotate the call sites potentially?

1 Like

As an example, here are all the annotations I had to use in my code when I had Union{T,Nothing}: Remove old uses of `.val::T` · SymbolicML/DynamicExpressions.jl@c5417ee · GitHub to get full optimization.

Now I just leave .val undefined (because it’s in a mutable struct). But I would like to use immutable structs for a variety of reasons – (1) fewer allocations; (2) better optimizations, as code can safely assume a field won’t change; (3) maybe better interfacing with static compilation (though from this thread I think this assumption might be wrong?). I guess I can put the val at the end and get this to work, but it seems sort of a code smell relative to if I could just use a formal undef (which presumably would not cause the type instabilities I am showing above).

If you have fields that won’t change, you can tag them as const in your mutable type to aid the compiler. Manual ref

julia> mutable struct Baz
           a::Int
           const b::Float64
       end

julia> baz = Baz(1, 1.5);

julia> baz.a = 2
2

julia> baz.b = 2.0
ERROR: setfield!: const field .b of type Baz cannot be changed
[...]
1 Like

Sorry for being offtopic, but you should strongly consider reordering the fields to

mutable struct Node{T} <: AbstractExpressionNode{T}
    degree::UInt8  # 0 for constant/variable, 1 for cos/sin, 2 for +/* etc.
    constant::Bool  # false if variable
    feature::UInt16  # If is a variable (e.g., x in cos(x)), this stores the feature index.
    op::UInt8  # If operator, this is the index of the operator in operators.binops, or operators.unaops

    val::T  # If is a constant, this stores the actual value

The reason is structure padding. If e.g. T is a mutable struct, then your Node will waste 6 bytes before val, and 5 bytes after op. If you reorder, you only have 3 bytes of structure padding after op before val.

You definitely can construct arbitrary T without knowing them, that’s how lots of serialization/deserialization packages like stdlib or jld2 work.

A simple example is

julia> @generated foo(::Type{T}) where T = Expr(:new, T)
julia> struct A
       x::Int
       end
julia> foo(A)
A(133928310153792)

julia> mutable struct B
       x::String
       end

julia> foo(B)
B(#undef)

Safety of that technique is debatable, but this is possible if needed.

2 Likes

Serialization is a special case, because (under the assumption that the serialization step was correct) the deserialization has known-valid data. This is not the case here though - this thread debates constructing new & valid objects for arbitrary T, which you can’t do without knowing how that is done for any particular T. Serialization sidesteps that problem by requiring a valid constructed object to exist in the first place.

Serialization also has an additional complication in that it’s invalid to preserve interior pointers (since they’re process-local), so those are lost, even if the object was initially valid. In order to reconstruct them, the serialization library needs to know how to properly serialize & deserialize objects of that type, meaning it needs to specialize for that type again.

1 Like

Video gif. A man uses his hands to imitate his mind being blown and fireworks and explosions are overlaid in front of him.
This is awesome, I literally had no idea! Will certainly start using that in my code :smiley:

Although unfortunately it looks like this is only v1.8+?

Quick followup - are there any reasons I would want to do something like the following instead of the example you sent?

struct Baz
    a::RefValue{Int}
    b::Float64
end

For example, is one better for allocations than the other?

Cool! Thanks. This seems like a full solution. I suppose I can also just have default behaviour set up for most common types, like foo(::Type{T}) where {T<:Number} = one(T), etc., (to avoid generating expressions for all types) and then users can declare custom initializers if needed.

I implore you not to use this. To quote the devdocs:

  • new
    Allocates a new struct-like object. First argument is the type. The new pseudo-function is lowered to this, and the type is always inserted by the compiler. This is very much an internal-only feature, and does no checking. Evaluating arbitrary new expressions can easily segfault.

This is not a stable thing to build a package off of.

4 Likes

Sure you can do this for most types.

But take care to remember that safety is debatable, e.g. foo(BigInt) is liable to spawn bats, segfault or corrupt memory and install ransomware and shut down your workplace.

(i.e. you need to document that T should not be parametrized with stuff types holding foreign pointers like e.g. BigInt)

It’s more insidious than that - your users MUST NOT (and I mean that in the RFC “should never ever EVER happen” kind of way) assume that whatever they get out from this is actually a valid instance of that type, even for isbitstype. Using Expr(:new) like that is bypassing any and all potential checks that normally happen when constructing an object of that type.

In the best worst-case, this leads to extremely difficult to debug heisenbugs due to now-broken assumptions the owner of that type made in its usage. Or the compiler made! If the compiler assumes that all instances must come from one of its lowered constructors and you bypass that, pretty much anything can happen when the compiler decides to abstractly interpret this.

1 Like

It should be possible to create default constructors in a not-insane way through a tactic like the following:

  • Primitive values are zero(T) or one(T).
  • Containers (UnionAll) are an empty example of the container, constrained to the given type parameters, defaulting to Any.
  • Refs are a reference to a default value of their type parameter.
  • Unions are a default value of the leftmost type.
  • Singleton types are themselves.
  • Tuples are a tuple of the provided type’s defaults.
  • Types are Union{}.
  • Structs call the constructor on default values for the fields (this only works consistently if there’s a constructor in the default form).
  • ::Function is identity, or for a concrete subtype, the only instance of that subtype.

I think that covers all the bases, right? To make it work consistently you’d need to examine the method table for structs and pick one, rather than assume they have a default constructor. But the principle remains the same.

It sounds like a lot of work to really get it right and cover all the edge cases, but the result would be a worthy package with a number of practical uses.

This is no worse or better than reinterpret or Vector{T}(undef, 1)[1].

My favorite example of weirdness from that are structs containing Bool. The compiler/runtime are very sure that the leading 7 bit of Bool are zero. They are, in fact, so sure that having anything else leads to UB with fun schizophrenic optimizations, and of course divergence of behavior between interpreted and compiled code, and of course divergence of behavior depending on success of type inference.

You can observe that funhouse by reinterpreting:

julia> struct B
       b::Bool
       end

julia> a=UInt8[127]
1-element Vector{UInt8}:
 0x7f

julia> b=reinterpret(B, a)
1-element reinterpret(B, ::Vector{UInt8}):
 B(true)

julia> b[1]
B(true)

julia> b[1].b == true
true

julia> foo(b)=b[1].b==true
foo (generic function with 1 method)

julia> foo(b)
false

But seriously, I think this is the only example where you get honest-to-god UB from badly initialized bitstypes?

No. There is no way.

Either make your field Union{T, Nothing} (this is not bad for performance), ask your users to not parametrize this with bad types, or simply don’t access invalid data.

For the BigInt example, you need to make sure that the REPL display code checks whether the bigint is valid before trying to print it (e.g. by having other logic that dictates that) – the printing code is the thing that can memory corrupt on access (because the underlying MPZ library assumes a valid pointer to whatever datastructures it has. If you instead fill it with uninitialized garbage you can end up with a pointer to somewhere on the heap containing other data…).

3 Likes

No, you cannot assume that all primitive values have a zero OR a one. For example, the structs you can define through FieldFlags.jl have no defined “default value”. They’re not even numeric at all, and don’t even have a zero initializer! There used to be one, but it caused some problems downstream precisely because assuming that all-zero is valid is not good.

That’s why I say that this is not possible for arbitrary T; you need to know what that T is to even be able to tell whether there could be a default value.

That veers more into the area of fuzzing (something one could do with PropCheck.jl, or soon better with an upcoming package of mine), but that brings its own problems - what if the constructor throws an error? Or worse yet, crashes julia? That’s not as absurd as it sounds, I’ve hit exactly that problem trying to do this before:

So if one doesn’t know how to construct a T and isn’t explicitly trying to fuzz a codebase, just generating a random “possibly valid” instance is going to lead to a lot of pain.

That’s true! To be clear, I think that is already pretty bad, and shouldn’t be made more likely by liberal use of Expr(:new), making the undef problem an issue for all T and not “just” Vector{T} :slight_smile:

It’s the best known one, but the example is representative for all invariant-breaking usages of reinterpret. At least the docstring of reinterpret warns about this problem now.

3 Likes

I don’t happen to understand what the issue would be with initializing an all-zero bitfield, can you elaborate?

I wouldn’t expect all types to have a zero or a one, primitive or otherwise, the function constructor would need to check that.

And sure, there must be structs out there which throw errors from validation if called on default types when defined this way. A best-effort basis would create the default function, call it in a try block, and return an informative error if you can’t make a default for that type.

Crashing Julia by calling a constructor sounds like a bug, though. A bug in the current release of the runtime isn’t a good argument against writing a library like this, bugs get patched eventually. We hope.

It’s too much to expect for a default-constructed value to have up-to-any correct semantics, but I maintain that it’s tractable to write a default-constructor-constructor which either returns one or throws an error if it can’t.