Incomplete initialization (control which fields are initialized in `new`)

FedericoStra · July 21, 2020, 10:06pm

Following the guidelines regarding incomplete initialization and parametric constructors, I can create a type

struct P{T}
    x::T
    y::T

    P{T}(x::T, y::T) where T = new{T}(x, y)
    P{T}(x::T) where T = new{T}(x)
end

P(x::T, y::T) where T = P{T}(x, y)
P(x::T) where T = P{T}(x)

that can be partially uninitialized. For instance, I can create P(1), which contains an arbitrary y, and P(BigInt), which contains an undefined reference.

I am interested in being able to initialize only y, and not x. The problem is that in the call to new{T}(...) I don’t know how to say that I want to put the single value inside y instead of x. I’ve tried new{T}(y=y) with no success, because new does not support keyword arguments.

Is it possible to control which field are initialized when doing an incomplete initialization, or are they necessarily a prefix of all the fields?

Of course, I can work around this limitation in the case of mutable structures like this

mutable struct Q{T}
    x::T
    y::T

    Q{T}(x::T, y::T) where T = new{T}(x, y)
    Q{T}(y::T) where T = (q = new{T}(); q.y = y; q)
end

Q(x::T, y::T) where T = Q{T}(x, y)
Q(y::T) where T = Q{T}(y)

but I’m really interested in the immutable case, because I want my struct to be “plain data” whenever possible.

marius311 · July 21, 2020, 10:47pm

I’m curious about the answer to your question too, but maybe worth mentioning the following might be a fairly clean workaround:

Base.@kwdef struct P{T}
    x::Union{T,Nothing} = nothing
    y::Union{T,Nothing} = nothing
end

julia> P(x=1)
P{Int64}(1, nothing)

julia> P(y=2)
P{Int64}(nothing, 2)

Tamas_Papp · July 22, 2020, 8:37am

I think that you can only do it in the order of the fields.

I am not sure what the use case is for immutable structs though. Do you have an example with more context?

FedericoStra · July 22, 2020, 1:08pm

Thanks for your suggestion. Base.@kwdef is indeed very handy, but I forgot to mention that I would like to avoid using Union{T,Nothing} as it incurs some (very slight) overhead to check the actual type at runtime. In my case I already have other fields (which I omitted in my simplified example) which tell me which fields are initialized and which are not (hence shall not be used).

FedericoStra · July 22, 2020, 1:18pm

One of the applications I’m interested in is to represent possibly unbounded open/closed intervals. What I have is something like this:

struct Interval{T}
    left_kind::Symbol
    right_kind::Symbol
    left::T
    right::T

    function Interval{T}(
        left_kind::Symbol,
        right_kind::Symbol,
        left::T,
        right::T,
        checks::Val{Checks} = Val(true),
    )::Interval{T} where {T,Checks}
        if Checks
            left_kind == :closed ||
                left_kind == :open ||
                error("invalid left endpoint kind")
            right_kind == :closed ||
                right_kind == :open ||
                error("invalid right endpoint kind")
            left <= right || error("incorrect order")
        end
        new{T}(left_kind, right_kind, left, right)
    end

    function Interval{T}(
        left_kind::Symbol,
        left::T,
        checks::Val{Checks} = Val(true),
    )::Interval{T} where {T,Checks}
        if Checks
            left_kind == :closed ||
                left_kind == :open ||
                error("invalid left endpoint kind")
        end
        new{T}(left_kind, :unbounded, left) # right is undefined
    end
end

This way I can represent bounded interval (1,5] as

Interval{Int}(:open, :closed, 1, 5)
Interval{BigInt}(:open, :closed, BigInt(1), BigInt(5))

And the unbounded from above interval [1,∞) as

Interval{Int}(:closed, 1)            # Interval{Int64}(:closed, :unbounded, 1, 140351003598080)
Interval{BigInt}(:closed, BigInt(1)) # Interval{BigInt}(:closed, :unbounded, 1, #undef)

In these last to cases, right is undefined.

What I cannot find I way to do is create intervals unbounded from below, skipping the initialization of left.

simeonschaub · July 22, 2020, 1:27pm

This is not possible with immutable struts, you will have to find another way to express this. If you don’t want type instabilities anywhere, you could for example just allow :open_inf and :open_neginf as bound specification, or have typemax(T) and typemin(T) to signify sentinal values that represent +/-infinity. If you use floating point numbers, those can even represent Inf themselves.

FedericoStra · July 22, 2020, 1:41pm

Of course I can use :open_inf and :open_neginf as bound specifications, that is exactly what I do: I use :unbounded. The problem is that I don’t have a generic way to initialize a variable of type T for which the user hasn’t passed a value in.

Using typemax(T) is incorrect.

Interval{Int16}(:closed, :closed, 0, 32767)
# == Interval{Int16}(:closed, :closed, 0, 32767)

is the mathematical interval [0, 32767], whereas

Interval{Int16}(:closed, Int16(0))
# == Interval{Int16}(:closed, :unbounded, 0, 2456) # 2456 is just random garbage

is the interval [0,∞).

Moreover, you cannot use typemax(BigInt).

I could decide to initialize right = zero(T) when right_kind == :unbounded, but that is also not optimal. I don’t want to assume anything on the type T. It is not necessarily T <: Real. It can be an arbitrary type with a (hopefully total) order. This would not work

Interval{Char}(:closed, 'a')
# == Interval{Char}(:closed, :unbounded, 'a', '\x0f') # '\x0f' is just random garbage

because zero(Char) is not defined.

I think the most straightforward and most generic solution is to just skip the initialization of the variable (being careful to not read it afterwards, of course).

If this cannot be achieved, I’ll be forced to go with the Union{T,Nothing} solution, which at least is semantically correct. I’ll have to benchmark this against the mutable struct approach, which allows finer control over which fields are initialized. I suspect the mutable version incurs a higher performance hit.

Tamas_Papp · July 22, 2020, 1:47pm

Given that you are using numeric types, I would go with zero(T) or similar — if I understand correctly, :unbounded in your code makes sure that the actual value won’t be used anyway.

That said, I would just use different types for left/right unbounded intervals. I don’t know your application, but generally they require different approaches in most algorithms anyway, so you could just dispatch accordingly.

FedericoStra · July 22, 2020, 1:56pm

zero does’t work for this reason.

FedericoStra:

I could decide to initialize right = zero(T) when right_kind == :unbounded , but that is also not optimal. I don’t want to assume anything on the type T . It is not necessarily T <: Real . It can be an arbitrary type with a (hopefully total) order. This would not work
Interval{Char}(:closed, 'a')
# == Interval{Char}(:closed, :unbounded, 'a', '\x0f') # '\x0f' is just random garbage
because zero(Char) is not defined.

That is correct. I have checks in place to guarantee that I won’t access undefined variables. The same you would do in C, if you like embracing the thrill of undefined behavior as much as I do

I specifically want all my bounded/unbounded/open/closed intervals to be of the same type Interval{T}, capturing only the nature of the underlying total order T we are working in, and not the “kind” of the interval. This is different from what is done in IntervalSets with

struct Interval{L,R,T}  <: TypedEndpointsInterval{L,R,T}
    left::T
    right::T

    Interval{L,R,T}(l, r) where {L,R,T} = ((a, b) = checked_conversion(T, l, r); new{L,R,T}(a, b))
end

The reason is that I have to work with collections of intervals of different kinds, and storing them in a heterogeneous array Vector{Interval{L,R,Int} where {L,R} incurs a performance overhead (~10x), which I want to get rid of.

The only other possibility that comes to my mind is to always store the endpoint in left for both [x,∞) and (-∞,x] (which among left_kind and right_kind is :unbounded disambiguates between the two). But that is just nasty and very error prone in the rest of the code. Everything would be so simple if I could just initialize only right and not left.

What do I mean with this last approach?

This is what I mean by “always store in left”:

    function Interval{T}(
        left_kind::Symbol,
        right_kind::Symbol,
        value::T,
        checks::Val{Checks} = Val(true),
    )::Interval{T} where {T,Checks}
        if Checks
            left_kind == :closed ||
                left_kind == :open ||
                left_kind == :unbounded ||
                error("invalid left endpoint kind")
            right_kind == :closed ||
                right_kind == :open ||
                right_kind == :unbounded ||
                error("invalid right endpoint kind")
            (left_kind == :unbounded && right_kind != :unbounded) ||
                (left_kind != :unbounded && right_kind == :unbounded) ||
                error("exactly one endpoint must be unbounded")
        end
        new{T}(left_kind, right_kind, value)
    end

Exactly one endpoint has to be :unbounded, but the value of the other endpoint is always stored in left, while right is left uninitialized. This however complicates further code, for instance the one retrieving a left/right endpoint if it exists:

function left_endpoint(int::Interval{T})::Union{T,Nothing} where {T}
    if int.left_kind == :unbounded
        nothing
    else
        int.left
    end
end

function right_endpoint(int::Interval{T})::Union{T,Nothing} where {T}
    if int.right_kind == :unbounded
        nothing
    else
        if int.left_kind == :unbounded
            int.left
        else
            int.right
        end
    end
end

Tamas_Papp · July 22, 2020, 2:16pm

For arbitrary types, there is no general API in Julia to just provide “any” value (the concept may not even make sense: one can define a concrete type that cannot have instances). You could define

just_some_value(::Type{T}) where {T<:Number} = zero(T)

and require that for other types, the user defines this method.

FedericoStra · July 22, 2020, 2:23pm

Precisely why I wanted to avoid the initialization altogether!

Yes, I can spam the namespace with another useless function just_some_value or default and force the users to remember to import and implement my useless function for their types, but that also has other downsides (and may not always be easy or possible at all, as you mention). For instance, when T is BigInt it causes some unnecessary allocation just to instantiate an object that will never be read. In this regard, the approach that I delineated at the end of my last comment is superior because it does not initialize right. It just seems messy to put the right value in the left field, but that is definitely more efficient. It is very very messy although. And it wouldn’t work in more general situations.

FedericoStra · July 22, 2020, 3:03pm

I’ve come to the realization that there would be two possible solutions for the future.

Allow new to accept keyword arguments, to specify which fields to initialize.
Allow mutation of immutable structs inside inner constructors. At least in the restricted case where one assigns to a field which was previously uninitialized. This choice has much broader implications.

I don’t see a reason to rule out 1.

I believe the pertaining code is buried inside either src/jltypes.c or src/datatype.c, but I’m not familiar at all with the codebase of the interpreter/compiler.

Could someone direct me a bit? I can open an issue to discuss this change and possibly come up with a PR.

Tamas_Papp · July 22, 2020, 3:20pm

Regardless of the proposal to extend new this way (cf this comment), I still think that you are fighting the language on this. I would do something like

struct ClosedEndPoint{T}
    x::T
end

struct OpenEndPoint{T}
    x::T
end

struct UnboundedEndPoint end

const EndPoint{T} = Union{ClosedEndPoint{T},OpenEndPoint{T},
                          UnboundedEndPoint}

struct Interval{T}
    left::EndPoint{T}
    right::EndPoint{T}
    # constructor omitted
end

and let the compiler take care of keeping track of the flags via tha Union optimization. I don’t think it will be worse than manually keeping track of the flags with Symbol, and may be better. YMMV.

FedericoStra · July 22, 2020, 4:00pm

While I agree that your proposed solution might work well in this instance, I still feel that having the possibility to selectively initialize fields in new() would be beneficial in other more complicated circumstances. Otherwise you are forced to adopt basically Union{T,Nothing} or Union{Some{T},Nothing} for every potentially uninitialized field. There are occasions where this is suboptimal, for instance if you have several fields x1, x2, x3, x4... which are always either all initialized or all uninitialized. So maybe instead of

struct Example{T}
    y::T
    x1::T
    x2::T
    x3::T
end

you must write

struct Example{T}
    y::T
    xs::Union{Tuple{T,T,T}, Nothing}
end

And in general this is not even possible, because there might be several possible different states of “undefinedness” for the structure, so you are forced to use Union{T,Nothing} on each field separately.

Vasily_Pisarev · July 22, 2020, 5:35pm

As long as one of the fields is not going to be accessed, why not initialize both with the same value?

That said, incomplete initialization of immutable struct fields only in the order they are defined indeed seems arbitrary. I agree that the ability to mutate structs inside constructors would be beneficial, if that is possible within the language model.

Tamas_Papp · July 22, 2020, 5:35pm

I see dealing with this explicitly with a Union as an advantage.

Generally, querying if a field is undefined is not a good strategy as it does not generalize to bits types. Then you need a flag of some sort (like in your example), so why not let the language handle it? Also note that if you branch on this, you should get efficient code.

FedericoStra · July 22, 2020, 6:17pm

Oh this is a clever idea! It works in this situation because we have multiple Ts, at least one of which is defined. We still waste e few cycles for the unnecessary initialization, but I could be OK with that. It does not generalize however to a situation like this

struct Either{S,T}
    s::S
    t::T
    flag::Bool
end

where there is no duplicate field of the same type to steal the value from.

Yes, mutability in inner constructors would allow many more things, but is definitely a gibber change to the language. Adding keyword arguments to new seems instead more straightforward, trivially backward compatible, very convenient (it doesn’t force you to define the fields in a specific order to work around the current limitation) and probably something that gets more easily accepted as a change to the language.

FedericoStra · July 22, 2020, 6:25pm

I’m note querying directly if the field is defined with Core.isdefined. That would give the wrong answer in this case for instance:

struct P{T}
    x::T
    P{T}() where T = new{T}()
end

p = P{Int}() # P{Int64}(140238278107968)
isdefined(p, :x) # true

This is because “plain data” fields have no real “undefined” state.

In the general situation (possibly more involved than my simple example with intervals), I may have several fields that collectively determine whether some other field has been initialized or not. Maybe I already need those other fields to indicate some other aspect of the state of the object, so they are not wasted just to indicate which fields are initialized. I would have them nonetheless. I just want to skip unnecessary (and impossible in general) initialization.

marius311 · July 22, 2020, 7:56pm

Makes sense, but in case you weren’t aware, note that Julia has some special optimizations for these small union Union{T,Nothing} kind of things, so the smallness of the overhead might surprise you, might be worth doing a quick benchmark. Alternatively, you could always do

Base.@kwdef struct P{T,X<:Union{T,Nothing},Y<:Union{T,Nothing}}
    x::X = nothing
    y::Y = nothing
end

which removes any instability (although I do think some of the other suggestions here might better).

Vasily_Pisarev · July 22, 2020, 8:03pm

In practice, that may be an even easier case than the original.
If s and t are never needed together, then Union{S,T} is the proper choice. In other cases, returning Either{Nothing,T} when s is not meaningful and Either{S,T} when it is might be a sensible workaround.
But having a clean way to skip initialization of some fields if needed is, for sure, better than use of workarounds and cryptic idioms.

Topic		Replies	Views
Using incompletely initialized structs instead of fields of type Union{Nothing,T} General Usage question	8	1222	December 15, 2020
Using mutable struct constructor General Usage question	2	444	March 21, 2022
How to make mutually referencial structs? General Usage	4	612	March 11, 2021
Create a struct with uninitialized fields General Usage struct , runtimegeneratedfunc	4	7894	October 10, 2021
Default constructor for any type? General Usage question	94	1740	February 29, 2024

Incomplete initialization (control which fields are initialized in `new`)

Related topics