Undef consistency: x == undef vs isassigned(x)

julia> a = resize!([], 1)
1-element Array{Any,1}:
 #undef

julia> b = [undef]
1-element Array{UndefInitializer,1}:
 array initializer with undefined values

So, I would assume that if I can

julia> b[1] == undef
true

That I should be able to do the same on a. However, that is not the case.

julia> a[1] == undef
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getindex(::Array{Any,1}, ::Int64) at ./array.jl:731
 [2] top-level scope at none:0

I am told that in this case, I must use isassigned instead of a typical == test for undef.

julia> isassigned(a, 1)
false

julia> isassigned(b, 1)
true

Why is there this apparent inconsistency in usage when an array is created with an undef vs when one is extended?

Thank you
rv

undef is not itself an undefined value, but rather a special array initializer that flags it can initialize with undefined values.

(I edited your post to improve the formatting — use triple backticks ``` for code blocks)

Or, another way of putting this: Julia (unlike, say, Javascript) does not have a defined undefined value that you can compare undefined things against. Which seems reasonable to me, because having a defined undefined seems…weird?

Thanks for reformatting. Is there a way I can preview my posts for cleanliness before posting (like in GitHub) so that you don’t have to cleanup behind me :slight_smile: ?

I understand the definition of undef - but I am asking a different question… For instance, let’s say,

julia> c = vcat(a, b...)
2-element Array{Any,1}:
 #undef                                    
    array initializer with undefined values

From a user’s perspective, there are two undefined entries… one is #undef, and the other is UndefInitializer . But, from that perspective, the elements are treated differently.

julia> c[1]
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getindex(::Array{Any,1}, ::Int64) at ./array.jl:731
 [2] top-level scope at none:0

julia> c[2]
array initializer with undefined values


julia> typeof(c[1])
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getindex(::Array{Any,1}, ::Int64) at ./array.jl:731
 [2] top-level scope at none:0

julia> typeof(c[2])
UndefInitializer

To defensively program, testing with isassigned for every element seems more expensive than testing for undef (it might not really be… I’m just trying to understand here)

julia> @code_llvm c[2] == undef

; Function ==
; Location: operators.jl:83
define i8 @"julia_==_35430"() {
top:
  ret i8 1
}

julia> @code_llvm isassigned(c, 1)

; Function isassigned
; Location: array.jl:204
define i8 @julia_isassigned_35332(%jl_value_t addrspace(10)* nonnull align 16 dereferenceable(40), i64) {
top:
; Location: array.jl:205
; Function -; {
; Location: int.jl:803
; Function -; {
; Location: int.jl:52
  %2 = add i64 %1, -1
;}}
; Location: array.jl:206
; Function length; {
; Location: array.jl:199
  %3 = addrspacecast %jl_value_t addrspace(10)* %0 to %jl_value_t addrspace(11)*
  %4 = bitcast %jl_value_t addrspace(11)* %3 to %jl_array_t addrspace(11)*
  %5 = getelementptr inbounds %jl_array_t, %jl_array_t addrspace(11)* %4, i64 0, i32 1
  %6 = load i64, i64 addrspace(11)* %5, align 8
;}
; Function <; {
; Location: int.jl:427
  %7 = icmp ult i64 %2, %6
;}
  br i1 %7, label %L17, label %L16

L16:                                              ; preds = %top
  ret i8 0

L17:                                              ; preds = %top
; Location: array.jl:207
  %8 = bitcast %jl_value_t addrspace(11)* %3 to %jl_value_t* addrspace(13)* addrspace(11)*
  %9 = load %jl_value_t* addrspace(13)*, %jl_value_t* addrspace(13)* addrspace(11)* %8, align 8
  %10 = getelementptr %jl_value_t*, %jl_value_t* addrspace(13)* %9, i64 %2
  %11 = load %jl_value_t*, %jl_value_t* addrspace(13)* %10, align 8
  %12 = icmp ne %jl_value_t* %11, null
; Function ==; {
; Location: promotion.jl:350
; Function ==; {
; Location: promotion.jl:425
  %13 = zext i1 %12 to i8
;}}
  ret i8 %13
}

This is equivalent to

@code_llvm undef == undef

and probably not what you wanted to look at.

Probably not! What is the correct way to get the @code_llvm for that case?

Depends on exactly what you want, but

f(c) = c[2] == undef
@code_llvm f(c)

is what I imagine you were looking for, though to have it be comparable, to your other one, you probably want to include the indexing calculation:

f(c, i) = c[i] == undef
@code_llvm f(c, 2)

No. There’s one undefined entry. undef is not undefined, period. Having it in an array is nothing different from any other types. It’s not undefined in the same sense you wouldn’t say a nothing entry is undefined.

It has been a long time since this come up but no, don’t use code_llvm or code_native unless you understand exactly what each instruction is about. Counting the number of lines, even when you use them correctly, does not tell you how fast things are.

1 Like

isassigned is basically never slower than indexing.

Depending on what you mean by “defensive program” (or what you are defending against) you obviously need to do different test, though basically none of them have anything specific to undef. Things that you may want to defend against include,

  1. Undefined entries. This is exactly what isassigned is testing. In a valid program, if isassigned(a, i) is true, a a[i] immediate following it should not raise an exception. (and before anyone mentions it, data race is undefined behavior).

  2. Entries of wrong type. Unless there’s a good reason, you usually should just put constraint on the eltype of the array. If your array type is Vector{Int}, no entry will have a type other than Int and nothing will be of value undef.

    If you need to accept an array with an abstract eltype, you usually either don’t care about the type, or you want to filter out some specific types and you should do just that, still with nothing specific to undef.

Instead of looking so deep into the internals with @code_llvm, you might be more interested in peeling back a few less layers and look at Meta.@lower.

The syntax c[2] == undef is just a function call with two arguments. You can equivalently write it as ==(c[2], undef). The syntax c[2] is also just a function call: it’s equivalent to getindex(c, 2). All arguments are evaluated before getting passed to a function. So c[2] == undef is really doing something like x = getindex(c, 2); ==(x, undef).

This is precisely what Meta.@lower shows, using a special intermediate representation where %X is a variable name:

julia> Meta.@lower c[2] == undef
:($(Expr(:thunk, CodeInfo(
 1 ─ %1 = (Base.getindex)(c, 2)
 │   %2 = %1 == undef
 └──      return %2
))))

Finally, those special lines that print as #undef entries aren’t values at all in Julia. They’re undefined, so you can never assign them to anything or even work with them at all. They’ll just error as soon as you try to access them. Which, of course, Julia needs to do before it’s able to use the == function.

The undef constant, on the other hand, is a value. In fact, it’s just a convenient short-hand name for UndefInitializer()… and that’s just defined with an empty struct X end definition. That’s it. We just give it a meaning by adding an Array method that dispatches on its type to return uninitialized memory. We don’t prevent you from passing this thing around, so you can store it in arrays or dictionaries or anywhere else, just like any other thing in Julia.

Instead of defensively putting isassigned around all your indexing calls, you can instead just never ask for undefined memory in the first place. Instead of Array, you can use fill or comprehensions. And instead of using resize! to grow vectors, use append! or push! as appropriate. You can also check out using the special missing singleton as a first-class way to work with missing values.

3 Likes

Thank you for your detailed explanation.

Just fyi - I don’t ask these questions because I do them. I ask these questions because they are, in my reading of the documentation, unanswered.

I generally go with the presumption that the goal of the forum is to help people understand the design or usage choices (which you precisely did your response) as opposed to making them feel stupid for asking questions that others here might instinctively know the answer to. The former attracts more people to contribute while latter leads to people just avoiding such forums and going about their way contributing to other things where they are made to feel welcome.

My sum total experience with julia is about 4 days, give or take a couple… and I found that while I was trying to work on #28415 that doing a unique(...) on a resized Array{Any,1} was barfing and that led to my set of experiments to understand the nature of undef - and posed the question having gone through the documentation and not finding adequate clarity in it.

This was really specifically an issue that I encountered only with Array{Any,1} being resized. I could not cause #undef to occur in other cases of a specific type.

Anyway, thank you for the response.

2 Likes

FWIW, I do feel that the current way undef works is a bit of a wart on the language (issue at More explicit representation of undef-ability of fields · Issue #25519 · JuliaLang/julia · GitHub), so perhaps we’ll be able to do something about it in the 2.0 timeframe.

1 Like

That would be a welcome usability change.

If nothing else, a design change that could be considered is - an expansion caused by a resize!(Array{Any,1}, N) could cause undefs to be added, instead of #undef. That would be intuitive, imo, since an expansion caused new undefined data to be added, instead of new #undef entries that one cannot really work with.

No, that would be completely wrong. undef is just a magic token to pass to the array constructor to have it produce #undef. It has no other semantics. In particular, you can have an #undef entry for any Array{Foo} if Foo is a mutable struct, but you cannot have an undef entry in such an array.

1 Like

Good point.

So what do you suggest ?

What I suggested in the issue above. Have an uninhabited Undef type. And avoid #undef as much as possible for semantics (use Union{T, Nothing} instead, since that’s fast now).