Correct typing of struct field with late initialization

Consider the following mutable struct:

Base.@kwdef mutable struct MyStruct
    data::Union{String, Nothing} = nothing
end

Is this the most efficient way to represent the fact that data may be “empty”, or does the type union with Nothing interfer with performance? It may happen that some time later the data is changed (and “” is actually a valid “non-empty” value).

x = MyStruct()

# maybe, change the value once
x.data = ""

# other stuff that reads from `x.data` multiple times

Additional question: Would changing the struct to be immutable and using Setfield be more efficient (assuming the data is written to at most once, but is being read multiple times)? Like so:

Base.@kwdef struct MyImmutableStruct
    data::Union{String, Nothing} = nothing
end

y = MyImmutableStruct()
Setfield.@set! y.data = ""

Probably not much, since with nothing or not there you will need in your code to check the value to see if it “empty” or not (meaning, you probably cannot avoid the associated branches anyway).

Probably (of course that depends on what you do with the data). But be aware that @set! does not really mutate the value, it is only a syntax sugar for creating a new instance of the immutable object. That is:

julia> struct A
           x::Int
       end

julia> using Setfield

julia> f(a::A) = @set! a.x = 2
f (generic function with 1 method)

julia> a = A(1)
A(1)

julia> f(a)
A(2)

julia> a
A(1)

julia> a = f(a) # you need to adapt the code to update the variable like this

julia> a
A(2)

2 Likes

Using an empty String instead of nothing will probably make immediate comparisons faster and reduce the risk of type instability (what you have is essentialy an abstract field in an struct), however, if an empty string has its own meaning (does not always mean nothing) then this is not possible. But it is hard to know if this will always be more or less performant, in a larger code, using nothing may, accidentally, end up being faster because the nothing is passed around and the functions compile optimized variants for the empty case.

An alternative, which I do not know if it is preferable, is to have an extra Bool field no_data and check it before always. However, I would only see an advantage to this approach if the data was not a String (or any non-bits type) and the struct was immutable, because this way you would be able to retain the isbitstype property.

1 Like