Named argument variant of `new` or some other way to initialize a `mutable struct` efficiently *and* "readable"?

Consider a (mutable) struct with a bunch of members, and a constructor:

mutable struct Foo
  x::Vector{Int}
  y::Vector{Int}
  z::Vector{Int}
  # ... here more members might come
  Foo() = new([1], [2], [3])
end

Now, an easy to make mistake is to swap the order of an argument. Or perhaps someone reorders the struct members but forgets to adjust all constructors. If the permuted arguments happen to have the same type, this can be hard to debug. Also, in general, in the above code, it can be difficult for a human to look at the invocation of new and decide which argument is which.

As a result, some people (including several collaborators of mine) prefer to write constructors that look like this:

function Foo()
  foo = new()
  foo.x = [1]
  foo.y = [2]
  foo.z = [3]
  return foo
end

That makes it crystal clear which value is assigned to which struct member, and also is resilient against reordering of struct members. However, it comes at a cost: in the former constructor, Julia is apparently able to detect that x,y,z are always assigned a value, hence never are undef. It then generates optimal machine code for e.g. a function like mylen(foo::Foo) = length(foo). Whereas with the alternate constructor, this breaks down, and it starts to insert null pointer checks into the generated code.

I wonder if there is a way to combine the efficiency with the readability, and would like to hear if people have suggestions for this?

One idea I was having was to add a new macro, say @new, which allows writing a constructor in keyword arg style: say:

function Foo()
  return @new (x = [1], y = [2], z = [3])
end

However, I don’t know how a macro would be able to determine the type of surrounding struct, which it would need to determine the positions of x, y, and z in the struct, so that it can now in which order to pass them to new. But perhaps the type Foo could be another argument for the macro.:

function Foo()
  return @new Foo(x = [1], y = [2], z = [3])
end

Perhaps people have other ideas / solutions for tackling this?

Maybe you’re looking for Base.@kwdef? Example from the documentation:

julia> Base.@kwdef struct Foo
           a::Int = 1         # specified default
           b::String          # required keyword
       end
Foo
  
julia> Foo(b="hi")
Foo(1, "hi")
  
julia> Foo()
ERROR: UndefKeywordError: keyword argument b not assigned
Stacktrace:
[...]

Thank you for the suggestion, I should have also mentioned this and explain why I don’t think it fits my case. First off, it defines an outer constructor, but I really only want this “inner”, as I don’t want to make such a constructor available to outside code. More importantly, though: I also need to sometimes leave some fields explicitly uninitialized, which @kwdef doesn’t seem to support – or maybe it does, but how? I tried using undef as a default argument, but that didn’t seem to work.

I think it should work with @kwdef if you give a proper default value: undef can be used for objects like arrays, but it doesn’t make sense for an Int… But you can use 0 for example. Sometimes people also declare fields with a::Union{Nothing,Int} = nothing or a::Union{Missing,Int} = missing to have unset values default to nothing or missing.

How? I tried this:

julia> module Tmp
         Base.@kwdef struct Foo
           a::Int = 1
           b::Vector{Int} = undef
         end
       end
Main.Tmp

julia> import .Tmp

julia> Tmp.Foo()
ERROR: MethodError: Cannot `convert` an object of type UndefInitializer to an object of type Vector{Int64}
Closest candidates are:
  convert(::Type{T}, ::AbstractArray) where T<:Array at array.jl:532
  convert(::Type{T}, ::LinearAlgebra.Factorization) where T<:AbstractArray at /Users/julia/buildbot/worker/package_macos64/build/usr/share/julia/stdlib/v1.6/LinearAlgebra/src/factorization.jl:58
  convert(::Type{T}, ::T) where T<:AbstractArray at abstractarray.jl:14
  ...
Stacktrace:
 [1] Main.Tmp.Foo(a::Int64, b::UndefInitializer)
   @ Main.Tmp ./REPL[2]:3
 [2] Main.Tmp.Foo(; a::Int64, b::UndefInitializer)
   @ Main.Tmp ./util.jl:453
 [3] Main.Tmp.Foo()
   @ Main.Tmp ./util.jl:453
 [4] top-level scope
   @ REPL[4]:1

mine would be something like this

mutable struct Foo
  x::Vector{Int}
  y::Vector{Int}
  z::Vector{Int}
  # ... here more members might come
  Foo(x, y, z) = new(x, y, z)
  Foo() = Foo([1], [2], [3])
  Foo(;x,y,z) = Foo(x,z,y)
end

julia> Foo(z=[1], y=[2], x=[5])
Foo([5], [1], [2])

Ah sorry for the confusion. We’re mixing undef and #undef. The value undef can be used for arrays if you want to initialize the struct field with a pre-allocated array, for example with b::Vector{Int} = Vector{Int}(undef, 3). If you don’t want any value as default, to have the field #undef, then indeed I don’t think you can do that with @kwdef: the generated code always goes through the default constructor with all arguments specified, and there is no value that can be passed to represent #undef.

1 Like

I assume you deliberately switched up the order here to illustrate why this is dangerous :wink:

Anyway: I do not want a keyword constructor for the users (although that is potentially useful in some cases, of course), but rather I want to be able to write my own constructors in a way that is resilient against changes in the order of members, and also readable. For the user, none of this should be visible.

hehe I like to live dangerously

1 Like

Hopefully I’d have written some cases in runtests.jl to catch that :slight_smile:

I suspect that there is no good workaround for the lack of keyword arguments in new. We would need new syntax for that.

This?

julia> Base.@kwdef struct Foo
           a::Int = 0
           b::Vector{Int} = Int[]
       end
Foo

julia> Foo(a=1)
Foo(1, Int64[])

I don’t think this is possible in any functionally meaningful sense.

Just for fun, here’s a way to work around “#undef exists but is not really supported”:

struct CanBeUndef{T}
    value::T
    CanBeUndef{T}() where {T} = new()
    CanBeUndef{T}(value) where {T} = new(value)
end
Base.convert(::Type{CanBeUndef{T}}, v::T) where {T} = CanBeUndef{T}(v)

Base.@kwdef struct Foo
    v::CanBeUndef{Vector{Int}} = CanBeUndef{Vector{Int}}()
    x::Int = 0
end

function Base.getproperty(f::Foo, sym::Symbol)
    field = getfield(f, sym)
    return sym === :v ? field.value : v
end

Used like this:

julia> f = Foo(x=1)
Foo(CanBeUndef{Vector{Int64}}(#undef), 1)

julia> f.v
ERROR: UndefRefError: access to undefined reference

julia> f = Foo(v=[1,2,3]);

julia> f.v
3-element Vector{Int64}:
 1
 2
 3

I’m not seriously suggesting to use this :slight_smile:

Did you check if defining the field as v::Union{Nothing, Vector{Int}} incurs a significant performance penalty?

@lmiq no, initializing a vector field to an empty vector is not the same as leaving it undefined.

Perhaps this variation of the example makes it clearer why it may be desirable to leave some members uninitialized:

mutable struct Foo
  x::Vector{Int}
  y::Vector{Int}
  z::Vector{Int}
  a::StructThatTakesOneMegabyteStorage
  b::StructThatTakesOneMinuteToInitialize
  Foo() = new([1], [2], [3])  # do not initialize a and b unless we have to
end

@sijo You are right, using a Union{Nothing, MyType} is indeed a possible solution in so far as also avoids the initialization problem (and thus any associated memory and/or runtime overhead), and it also works with @kwdef, and I may resort to that. The downside is that those fields now have three possible states: undef, nothing or an instance of MyType. But I hope this can be resolved by ensuring all constructors always initialize the field (so it never is undef – AFAIK once a field has been initialized, it can never return to undef state?! so then I’d hope the compiler would be able to deduce that it can remove all undef checks from code accessing the field, replacing them with checks for nothing instead). And of course with @kwdef providing a default value of nothing is possible. So I’ll experiment a bit with that to see how the performance impact ends up.

True, but from the OP having undef values appeared to be a problem. It is not clear to me why you want to have them, and what possible disadvantage the empty vector may have (allocation of one pointer?).

Edit: I see now that you provided an example there to explain. imho that is better solved by having an empty state for the large structures, or wrapping the field in a one-element vector of the corresponding type and leaving that empty.