`undef` for `isbits` types with custom constructors

Vasily_Pisarev · October 28, 2020, 7:51pm

Let’s say one defines a type with an inner constructor which keeps some invariant:

struct OrdPair{T}
    lesser::T
    greater::T
    OrdPair(x::T, y::T) where T = new{T}(minmax(x, y)...)
end

Now, if one creates an uninitialized array of OrdPairs, or a reference, the invariant enforced by the custom constructor (ordering in the case above) does not necessarily hold:

julia> a = Vector{OrdPair{Int8}}(undef, 10)
10-element Array{OrdPair{Int8},1}:
 OrdPair{Int8}(96, 3)
...

The same issue applies to the proposal to mutate fields of structs stored in arrays. If it becomes a feature, should it be allowed for immutable types with user-provided inner constructors?

Is it a good idea to document that uninitialized arrays ignore the invariants provided by an inner constructor?

Elrod · October 28, 2020, 8:08pm

FWIW, this is already fairly easy to do:

julia> struct OrdPair{T}
           lesser::T
           greater::T
           OrdPair(x::T, y::T) where T = new{T}(minmax(x, y)...)
       end

julia> v = [OrdPair(rand(),rand()) for _ ∈ 1:100];

julia> reinterpret(Float64, v)[1] = 100;

julia> v[1]
OrdPair{Float64}(100.0, 0.22879485125479637)

julia> reinterpret(Tuple{Float64,Float64}, v)[1] = (120.0, -95.0);

julia> v[1]
OrdPair{Float64}(120.0, -95.0)

I’d prefer flexibility over constraints.

benninkrs · October 28, 2020, 11:50pm

My view is that if something is initialized as undef then its contents are garbage and invariants should not be expected to hold.

Vasily_Pisarev · October 29, 2020, 9:55am

The problem is, there’s no way of checking whether an array or a reference holds a valid isbits value or garbage. In contrast, with non-isbits types, one gets #undef which can be checked and throws an error if accessed.

It bothers me a bit that a value taken from an array or a field of a struct may potentially be garbage and break some invariant assumed of the type. I’m wondering if there could be a way to more strictly ensure that a collection or a struct contains only values created by a valid constructor without losing the performance and flexibility.

Tamas_Papp · October 29, 2020, 12:53pm

Sticking to a few style rules helps; eg if a function creates an uninitialized array, it should not leave the function without getting initialized.

benninkrs · October 29, 2020, 11:35pm

But if the data is garbage, the struct has no use; so then why would you care whether the invariant is satisfied?

jling · October 30, 2020, 1:08am

edit:
there is not

Mason · October 30, 2020, 1:21am

Sufficiently devious users have other ways of bypassing your inner constructor without needing to resort to mutable memory like with undef or reinterpret:

julia> macro new(T, args...)
           (esc ∘ Expr)(:new, T, args...)
       end
@new (macro with 1 method)

julia> struct OrdPair{T}
           lesser::T
           greater::T
           OrdPair(x::T, y::T) where T = new{T}(minmax(x, y)...)
       end

julia> OrdPair(2, 1)
OrdPair{Int64}(1, 2)

julia> @new(OrdPair{Int}, 2, 1)
OrdPair{Int64}(2, 1)

Fundamentally, Julia is just a very hackable language whose internals are exposed to developers whenever possible. This is part of it’s charm.

Just document that if users abuse the language’s internals to bypass the inner constructor then it may cause issues.

sijo · October 30, 2020, 11:41am

It wouldn’t hurt of course, but I would hope it is unnecessary: The whole point of undef is to defer initialization for performance reasons. This means the data will be correctly initialized (with your constructor) but later, and using it before that point is a bug. So undef must be used responsibly. It’s like @inbounds: the user is responsible for using it correctly, otherwise it can give undefined behavior.

Tamas_Papp · October 30, 2020, 11:54am

I don’t think that this needs to be documented any more explicitly. It should be implicitly understood that if you break anything by using internal constructs in the devdocs, then you get to keep both pieces.

Vasily_Pisarev · October 30, 2020, 1:18pm

A macro like @inbounds is what I was thinking of as a possibility.
Say, undef_init(T, dims), by default, creates Array{Union{Missing, T}} filled with missing.
@unsafe_init undef_init(T, dims) ignores the safeguard.
But that changes eltype of output depending on the presence of macro which isn’t great.

Frankly, I posted the question to check whether this issue with uninitialized arrays is a necessary evil or I am just not seeing the possibilities to make it better.

simeonschaub · October 30, 2020, 1:46pm

If you need the safeguard, you should just use fill or list comprehensions directly. There is no point in having the undef array constructor call the object’s constructor, because that won’t be any faster than using any other method to fill the array with sensible values, so you might as well just not use undef at all. As others already pointed out, if you use the undef constructor anywhere in your code, you should always guarantee that an element is written to the array before anyone tries to read that element from the array.

sijo · October 30, 2020, 2:20pm

I think there’s an unfortunate confusion that undef should return missing values, because that’s what you get from Array{Union{Missing, Int}}(undef,3). But that’s wrong: missing ≠ undefined.

Array{Union{Missing, Int}}(undef,3) doesn’t return missing because it’s the right thing to do, but because it (unfortunately) has to do some initialization for the union type, and it’s cheapest to just initialize everything to zero. The fact that the “zero value” is missing is an implementation detail, we should not rely on it (and it’s not always true!). See https://github.com/JuliaLang/julia/pull/31091 for details.

Vasily_Pisarev · October 30, 2020, 3:00pm

So, fundamentally it goes down to the fact that we occasionally want to allocate memory not yet associated to any object.

It could’ve been prohibited (like it is in functional languages, I guess), but then lots of core types would have to be implemented in a lower-level language to be performant. So, Julia chooses performance and leaves correctness to the programmers. Makes sense.

Tamas_Papp · October 30, 2020, 3:10pm

This is a misunderstanding. When using Array{T}(undef, ...),the programmer is telling Julia that they are willing to deal with an array that has total garbage, in exchange for a speed improvement (which, TBH, is minor in a lot of cases) and/or not having to come up with a “representative” value.

The language is not choosing anything here, it is just allowing it.

Mason · October 30, 2020, 8:09pm

This is a bit of a semantic game, but surely the language does choose to allow the programmer to do unsafe things in julia.

This is a choice that almost every language makes (including functional ones). It’s just a question of how many hoops the language decides the programmer must jump through to do those unsafe things.

Julia strikes about a middle balance in the landscape of possibilities here I’d say.

foobar_lv2 · October 30, 2020, 8:44pm

It could’ve been prohibited (like it is in functional languages, I guess)

It’s not so much about functional/imperative as about memory-safety. Some languages like e.g. java, python, scala, haskell, make it very hard to write unsafe code; some languages like e.g. C/C++ make it very hard to write safe code; and some, like julia make all code easy.

This is probably a nightmare for large corporate projects that need to work with tons of very junior / outsourced developers and have a large security surface on mountains of business logic and legacy code. In such cases, language limitations are very useful (e.g. private fields, or constructors enforcing invariants).

Julia design philosophy is more like: A limitation is only imposed if it helps codegen, not to protect junior devs from mistakes.

Afaiu the only justification for keeping inner constructors (as opposed to factory methods) is: If you have a non-isbits type, then codegen wants to know which reference fields are potentially uninitialized:

julia> mutable struct foo
       a
       foo(x) = new(x)
       foo()=new()
       end

julia> mutable struct bar
       a
       bar(x) = new(x)
       end

julia> function kill(x)
       ptr = reinterpret(Ptr{Int}, pointer_from_objref(x))
       unsafe_store!(ptr, 0)
       nothing
       end

julia> access(x)=x.a

julia> f=foo(1); kill(f); access(f)
ERROR: UndefRefError: access to undefined reference

julia> b=bar(1); kill(b); access(b)
signal (11): Segmentation fault

The primary real-world use of inner constructors that I have seen in julia is to establish ownership of external resources in wrapper objects: The inner constructor establishes finalizer code that frees the external resource once the wrapper object has been reclaimed. Since this doesn’t establish useful invariants that the compiler understands, I am not happy about such unforced use of this rather obscure language feature – but I guess that’s a style question.

Tamas_Papp · October 31, 2020, 8:46am

I am not sure about this — AFAIK not a whole lot of languages even face this choice.

The issue is, broadly, the following: for a given bits type T, the user designates some bit patterns as “valid” (technically, the inner constructor verifies this).

Given a mutable container of eltype T, the language cannot readily produce “valid” instance (that may be an intractable problem in general), so the user would have to provide a way to generate a “valid” instance via a designated API. In generic code this quickly becomes cumbersome. This instance may just be overwritten in a typical application.

Topic		Replies	Views
Has `undef` lost its undefedness? General Usage	4	1903	July 12, 2018
Parametric Type for Uninitialized/Undef Objects General Usage question , parametric-types , ref , undef , undefinitializer	0	255	December 12, 2022
Puzzling behavior of "wrong" inner constructor General Usage struct , constructors	4	378	November 9, 2022
Mutable struct without fields behaving weirdly General Usage	14	3397	February 2, 2018
Initializing Array of Arrays with undef gives UndefRefError? General Usage arrays	15	2009	August 23, 2022

`undef` for `isbits` types with custom constructors

Related topics