Let’s say one defines a type with an inner constructor which keeps some invariant:
struct OrdPair{T}
lesser::T
greater::T
OrdPair(x::T, y::T) where T = new{T}(minmax(x, y)...)
end
Now, if one creates an uninitialized array of OrdPairs, or a reference, the invariant enforced by the custom constructor (ordering in the case above) does not necessarily hold:
julia> a = Vector{OrdPair{Int8}}(undef, 10)
10-element Array{OrdPair{Int8},1}:
OrdPair{Int8}(96, 3)
...
The same issue applies to the proposal to mutate fields of structs stored in arrays. If it becomes a feature, should it be allowed for immutable types with user-provided inner constructors?
Is it a good idea to document that uninitialized arrays ignore the invariants provided by an inner constructor?
The problem is, there’s no way of checking whether an array or a reference holds a valid isbits value or garbage. In contrast, with non-isbits types, one gets #undef which can be checked and throws an error if accessed.
It bothers me a bit that a value taken from an array or a field of a struct may potentially be garbage and break some invariant assumed of the type. I’m wondering if there could be a way to more strictly ensure that a collection or a struct contains only values created by a valid constructor without losing the performance and flexibility.
Sufficiently devious users have other ways of bypassing your inner constructor without needing to resort to mutable memory like with undef or reinterpret:
julia> macro new(T, args...)
(esc ∘ Expr)(:new, T, args...)
end
@new (macro with 1 method)
julia> struct OrdPair{T}
lesser::T
greater::T
OrdPair(x::T, y::T) where T = new{T}(minmax(x, y)...)
end
julia> OrdPair(2, 1)
OrdPair{Int64}(1, 2)
julia> @new(OrdPair{Int}, 2, 1)
OrdPair{Int64}(2, 1)
Fundamentally, Julia is just a very hackable language whose internals are exposed to developers whenever possible. This is part of it’s charm.
Just document that if users abuse the language’s internals to bypass the inner constructor then it may cause issues.
It wouldn’t hurt of course, but I would hope it is unnecessary: The whole point of undef is to defer initialization for performance reasons. This means the data will be correctly initialized (with your constructor) but later, and using it before that point is a bug. So undef must be used responsibly. It’s like @inbounds: the user is responsible for using it correctly, otherwise it can give undefined behavior.
I don’t think that this needs to be documented any more explicitly. It should be implicitly understood that if you break anything by using internal constructs in the devdocs, then you get to keep both pieces.
A macro like @inbounds is what I was thinking of as a possibility.
Say, undef_init(T, dims), by default, creates Array{Union{Missing, T}} filled with missing. @unsafe_init undef_init(T, dims) ignores the safeguard.
But that changes eltype of output depending on the presence of macro which isn’t great.
Frankly, I posted the question to check whether this issue with uninitialized arrays is a necessary evil or I am just not seeing the possibilities to make it better.
If you need the safeguard, you should just use fill or list comprehensions directly. There is no point in having the undef array constructor call the object’s constructor, because that won’t be any faster than using any other method to fill the array with sensible values, so you might as well just not use undef at all. As others already pointed out, if you use the undef constructor anywhere in your code, you should always guarantee that an element is written to the array before anyone tries to read that element from the array.
I think there’s an unfortunate confusion that undef should return missing values, because that’s what you get from Array{Union{Missing, Int}}(undef,3). But that’s wrong: missing ≠ undefined.
Array{Union{Missing, Int}}(undef,3) doesn’t return missing because it’s the right thing to do, but because it (unfortunately) has to do some initialization for the union type, and it’s cheapest to just initialize everything to zero. The fact that the “zero value” is missing is an implementation detail, we should not rely on it (and it’s not always true!). See https://github.com/JuliaLang/julia/pull/31091 for details.
So, fundamentally it goes down to the fact that we occasionally want to allocate memory not yet associated to any object.
It could’ve been prohibited (like it is in functional languages, I guess), but then lots of core types would have to be implemented in a lower-level language to be performant. So, Julia chooses performance and leaves correctness to the programmers. Makes sense.
This is a misunderstanding. When using Array{T}(undef, ...),the programmer is telling Julia that they are willing to deal with an array that has total garbage, in exchange for a speed improvement (which, TBH, is minor in a lot of cases) and/or not having to come up with a “representative” value.
The language is not choosing anything here, it is just allowing it.
This is a bit of a semantic game, but surely the language does choose to allow the programmer to do unsafe things in julia.
This is a choice that almost every language makes (including functional ones). It’s just a question of how many hoops the language decides the programmer must jump through to do those unsafe things.
Julia strikes about a middle balance in the landscape of possibilities here I’d say.
It could’ve been prohibited (like it is in functional languages, I guess)
It’s not so much about functional/imperative as about memory-safety. Some languages like e.g. java, python, scala, haskell, make it very hard to write unsafe code; some languages like e.g. C/C++ make it very hard to write safe code; and some, like julia make all code easy.
This is probably a nightmare for large corporate projects that need to work with tons of very junior / outsourced developers and have a large security surface on mountains of business logic and legacy code. In such cases, language limitations are very useful (e.g. private fields, or constructors enforcing invariants).
Julia design philosophy is more like: A limitation is only imposed if it helps codegen, not to protect junior devs from mistakes.
Afaiu the only justification for keeping inner constructors (as opposed to factory methods) is: If you have a non-isbits type, then codegen wants to know which reference fields are potentially uninitialized:
julia> mutable struct foo
a
foo(x) = new(x)
foo()=new()
end
julia> mutable struct bar
a
bar(x) = new(x)
end
julia> function kill(x)
ptr = reinterpret(Ptr{Int}, pointer_from_objref(x))
unsafe_store!(ptr, 0)
nothing
end
julia> access(x)=x.a
julia> f=foo(1); kill(f); access(f)
ERROR: UndefRefError: access to undefined reference
julia> b=bar(1); kill(b); access(b)
signal (11): Segmentation fault
The primary real-world use of inner constructors that I have seen in julia is to establish ownership of external resources in wrapper objects: The inner constructor establishes finalizer code that frees the external resource once the wrapper object has been reclaimed. Since this doesn’t establish useful invariants that the compiler understands, I am not happy about such unforced use of this rather obscure language feature – but I guess that’s a style question.
I am not sure about this — AFAIK not a whole lot of languages even face this choice.
The issue is, broadly, the following: for a given bits type T, the user designates some bit patterns as “valid” (technically, the inner constructor verifies this).
Given a mutable container of eltypeT, the language cannot readily produce “valid” instance (that may be an intractable problem in general), so the user would have to provide a way to generate a “valid” instance via a designated API. In generic code this quickly becomes cumbersome. This instance may just be overwritten in a typical application.