struct P{T}
x::T
y::T
P{T}(x::T, y::T) where T = new{T}(x, y)
P{T}(x::T) where T = new{T}(x)
end
P(x::T, y::T) where T = P{T}(x, y)
P(x::T) where T = P{T}(x)
that can be partially uninitialized. For instance, I can create P(1), which contains an arbitrary y, and P(BigInt), which contains an undefined reference.
I am interested in being able to initialize only y, and not x. The problem is that in the call to new{T}(...) I donât know how to say that I want to put the single value inside y instead of x. Iâve tried new{T}(y=y) with no success, because new does not support keyword arguments.
Is it possible to control which field are initialized when doing an incomplete initialization, or are they necessarily a prefix of all the fields?
Of course, I can work around this limitation in the case of mutable structures like this
mutable struct Q{T}
x::T
y::T
Q{T}(x::T, y::T) where T = new{T}(x, y)
Q{T}(y::T) where T = (q = new{T}(); q.y = y; q)
end
Q(x::T, y::T) where T = Q{T}(x, y)
Q(y::T) where T = Q{T}(y)
but Iâm really interested in the immutable case, because I want my struct to be âplain dataâ whenever possible.
Thanks for your suggestion. Base.@kwdef is indeed very handy, but I forgot to mention that I would like to avoid using Union{T,Nothing} as it incurs some (very slight) overhead to check the actual type at runtime. In my case I already have other fields (which I omitted in my simplified example) which tell me which fields are initialized and which are not (hence shall not be used).
One of the applications Iâm interested in is to represent possibly unbounded open/closed intervals. What I have is something like this:
struct Interval{T}
left_kind::Symbol
right_kind::Symbol
left::T
right::T
function Interval{T}(
left_kind::Symbol,
right_kind::Symbol,
left::T,
right::T,
checks::Val{Checks} = Val(true),
)::Interval{T} where {T,Checks}
if Checks
left_kind == :closed ||
left_kind == :open ||
error("invalid left endpoint kind")
right_kind == :closed ||
right_kind == :open ||
error("invalid right endpoint kind")
left <= right || error("incorrect order")
end
new{T}(left_kind, right_kind, left, right)
end
function Interval{T}(
left_kind::Symbol,
left::T,
checks::Val{Checks} = Val(true),
)::Interval{T} where {T,Checks}
if Checks
left_kind == :closed ||
left_kind == :open ||
error("invalid left endpoint kind")
end
new{T}(left_kind, :unbounded, left) # right is undefined
end
end
This way I can represent bounded interval (1,5] as
This is not possible with immutable struts, you will have to find another way to express this. If you donât want type instabilities anywhere, you could for example just allow :open_inf and :open_neginf as bound specification, or have typemax(T) and typemin(T) to signify sentinal values that represent +/-infinity. If you use floating point numbers, those can even represent Inf themselves.
Of course I can use :open_inf and :open_neginf as bound specifications, that is exactly what I do: I use :unbounded. The problem is that I donât have a generic way to initialize a variable of type T for which the user hasnât passed a value in.
Interval{Int16}(:closed, Int16(0))
# == Interval{Int16}(:closed, :unbounded, 0, 2456) # 2456 is just random garbage
is the interval [0,â).
Moreover, you cannot use typemax(BigInt).
I could decide to initialize right = zero(T) when right_kind == :unbounded, but that is also not optimal. I donât want to assume anything on the type T. It is not necessarily T <: Real. It can be an arbitrary type with a (hopefully total) order. This would not work
Interval{Char}(:closed, 'a')
# == Interval{Char}(:closed, :unbounded, 'a', '\x0f') # '\x0f' is just random garbage
because zero(Char) is not defined.
I think the most straightforward and most generic solution is to just skip the initialization of the variable (being careful to not read it afterwards, of course).
If this cannot be achieved, Iâll be forced to go with the Union{T,Nothing} solution, which at least is semantically correct. Iâll have to benchmark this against the mutable struct approach, which allows finer control over which fields are initialized. I suspect the mutable version incurs a higher performance hit.
Given that you are using numeric types, I would go with zero(T) or similar â if I understand correctly, :unbounded in your code makes sure that the actual value wonât be used anyway.
That said, I would just use different types for left/right unbounded intervals. I donât know your application, but generally they require different approaches in most algorithms anyway, so you could just dispatch accordingly.
That is correct. I have checks in place to guarantee that I wonât access undefined variables. The same you would do in C, if you like embracing the thrill of undefined behavior as much as I do
I specifically want all my bounded/unbounded/open/closed intervals to be of the same type Interval{T}, capturing only the nature of the underlying total order T we are working in, and not the âkindâ of the interval. This is different from what is done in IntervalSets with
struct Interval{L,R,T} <: TypedEndpointsInterval{L,R,T}
left::T
right::T
Interval{L,R,T}(l, r) where {L,R,T} = ((a, b) = checked_conversion(T, l, r); new{L,R,T}(a, b))
end
The reason is that I have to work with collections of intervals of different kinds, and storing them in a heterogeneous array Vector{Interval{L,R,Int} where {L,R} incurs a performance overhead (~10x), which I want to get rid of.
The only other possibility that comes to my mind is to always store the endpoint in left for both [x,â) and (-â,x] (which among left_kind and right_kind is :unbounded disambiguates between the two). But that is just nasty and very error prone in the rest of the code. Everything would be so simple if I could just initialize only right and not left.
What do I mean with this last approach?
This is what I mean by âalways store in leftâ:
function Interval{T}(
left_kind::Symbol,
right_kind::Symbol,
value::T,
checks::Val{Checks} = Val(true),
)::Interval{T} where {T,Checks}
if Checks
left_kind == :closed ||
left_kind == :open ||
left_kind == :unbounded ||
error("invalid left endpoint kind")
right_kind == :closed ||
right_kind == :open ||
right_kind == :unbounded ||
error("invalid right endpoint kind")
(left_kind == :unbounded && right_kind != :unbounded) ||
(left_kind != :unbounded && right_kind == :unbounded) ||
error("exactly one endpoint must be unbounded")
end
new{T}(left_kind, right_kind, value)
end
Exactly one endpoint has to be :unbounded, but the value of the other endpoint is always stored in left, while right is left uninitialized. This however complicates further code, for instance the one retrieving a left/right endpoint if it exists:
function left_endpoint(int::Interval{T})::Union{T,Nothing} where {T}
if int.left_kind == :unbounded
nothing
else
int.left
end
end
function right_endpoint(int::Interval{T})::Union{T,Nothing} where {T}
if int.right_kind == :unbounded
nothing
else
if int.left_kind == :unbounded
int.left
else
int.right
end
end
end
For arbitrary types, there is no general API in Julia to just provide âanyâ value (the concept may not even make sense: one can define a concrete type that cannot have instances). You could define
just_some_value(::Type{T}) where {T<:Number} = zero(T)
and require that for other types, the user defines this method.
Precisely why I wanted to avoid the initialization altogether!
Yes, I can spam the namespace with another useless function just_some_value or default and force the users to remember to import and implement my useless function for their types, but that also has other downsides (and may not always be easy or possible at all, as you mention). For instance, when T is BigInt it causes some unnecessary allocation just to instantiate an object that will never be read. In this regard, the approach that I delineated at the end of my last comment is superior because it does not initialize right. It just seems messy to put the right value in the left field, but that is definitely more efficient. It is very very messy although. And it wouldnât work in more general situations.
Iâve come to the realization that there would be two possible solutions for the future.
Allow new to accept keyword arguments, to specify which fields to initialize.
Allow mutation of immutable structs inside inner constructors. At least in the restricted case where one assigns to a field which was previously uninitialized. This choice has much broader implications.
I donât see a reason to rule out 1.
I believe the pertaining code is buried inside either src/jltypes.c or src/datatype.c, but Iâm not familiar at all with the codebase of the interpreter/compiler.
Could someone direct me a bit? I can open an issue to discuss this change and possibly come up with a PR.
Regardless of the proposal to extend new this way (cf this comment), I still think that you are fighting the language on this. I would do something like
struct ClosedEndPoint{T}
x::T
end
struct OpenEndPoint{T}
x::T
end
struct UnboundedEndPoint end
const EndPoint{T} = Union{ClosedEndPoint{T},OpenEndPoint{T},
UnboundedEndPoint}
struct Interval{T}
left::EndPoint{T}
right::EndPoint{T}
# constructor omitted
end
and let the compiler take care of keeping track of the flags via tha Union optimization. I donât think it will be worse than manually keeping track of the flags with Symbol, and may be better. YMMV.
While I agree that your proposed solution might work well in this instance, I still feel that having the possibility to selectively initialize fields in new() would be beneficial in other more complicated circumstances. Otherwise you are forced to adopt basically Union{T,Nothing} or Union{Some{T},Nothing} for every potentially uninitialized field. There are occasions where this is suboptimal, for instance if you have several fields x1, x2, x3, x4... which are always either all initialized or all uninitialized. So maybe instead of
struct Example{T}
y::T
x1::T
x2::T
x3::T
end
you must write
struct Example{T}
y::T
xs::Union{Tuple{T,T,T}, Nothing}
end
And in general this is not even possible, because there might be several possible different states of âundefinednessâ for the structure, so you are forced to use Union{T,Nothing} on each field separately.
As long as one of the fields is not going to be accessed, why not initialize both with the same value?
That said, incomplete initialization of immutable struct fields only in the order they are defined indeed seems arbitrary. I agree that the ability to mutate structs inside constructors would be beneficial, if that is possible within the language model.
I see dealing with this explicitly with a Union as an advantage.
Generally, querying if a field is undefined is not a good strategy as it does not generalize to bits types. Then you need a flag of some sort (like in your example), so why not let the language handle it? Also note that if you branch on this, you should get efficient code.
Oh this is a clever idea! It works in this situation because we have multiple Ts, at least one of which is defined. We still waste e few cycles for the unnecessary initialization, but I could be OK with that. It does not generalize however to a situation like this
struct Either{S,T}
s::S
t::T
flag::Bool
end
where there is no duplicate field of the same type to steal the value from.
Yes, mutability in inner constructors would allow many more things, but is definitely a gibber change to the language. Adding keyword arguments to new seems instead more straightforward, trivially backward compatible, very convenient (it doesnât force you to define the fields in a specific order to work around the current limitation) and probably something that gets more easily accepted as a change to the language.
Iâm note querying directly if the field is defined with Core.isdefined. That would give the wrong answer in this case for instance:
struct P{T}
x::T
P{T}() where T = new{T}()
end
p = P{Int}() # P{Int64}(140238278107968)
isdefined(p, :x) # true
This is because âplain dataâ fields have no real âundefinedâ state.
In the general situation (possibly more involved than my simple example with intervals), I may have several fields that collectively determine whether some other field has been initialized or not. Maybe I already need those other fields to indicate some other aspect of the state of the object, so they are not wasted just to indicate which fields are initialized. I would have them nonetheless. I just want to skip unnecessary (and impossible in general) initialization.
Makes sense, but in case you werenât aware, note that Julia has some special optimizations for these small union Union{T,Nothing} kind of things, so the smallness of the overhead might surprise you, might be worth doing a quick benchmark. Alternatively, you could always do
Base.@kwdef struct P{T,X<:Union{T,Nothing},Y<:Union{T,Nothing}}
x::X = nothing
y::Y = nothing
end
which removes any instability (although I do think some of the other suggestions here might better).
In practice, that may be an even easier case than the original.
If s and t are never needed together, then Union{S,T} is the proper choice. In other cases, returning Either{Nothing,T} when s is not meaningful and Either{S,T} when it is might be a sensible workaround.
But having a clean way to skip initialization of some fields if needed is, for sure, better than use of workarounds and cryptic idioms.