Why does arrayref throw?

Hi, I’m confused!

This code

function test(a::Vector{Vector{Int}})
    return @inbounds a[1]
end

turns into the typed code

test(a::Vector{Vector{Int64}})
Body::Vector{Int64} (!c,+e,!n,+t,+s,?m,+i)
2 1 ─ %1 = Base.arrayref(false, a, 1)::Vector{Int64}                                                                                                          │╻ getindex
  └──      return %1

which in turn turns into the LLVM code

define nonnull {}* @julia_test_895({}* noundef nonnull align 16 dereferenceable(40) %0) #0 {
top:
; ┌ @ essentials.jl:13 within `getindex`
   %1 = bitcast {}* %0 to {}***
   %2 = load {}**, {}*** %1, align 8
   %3 = load {}*, {}** %2, align 8
   %.not = icmp eq {}* %3, null
   br i1 %.not, label %fail, label %pass

fail:                                             ; preds = %top
   call void @ijl_throw({}* inttoptr (i64 140720565982192 to {}*))
   unreachable

pass:                                             ; preds = %top
; └
  ret {}* %3
}

Why is this check needed? Since 1 is inbounds, a[1] is a Vector{Int}, no?

Welcome! That’s not a throw from a bounds check, it’s a throw from an undefined reference (essentially a null pointer):

julia> function test(a::Vector{Vector{Int}})
           return @inbounds a[1]
       end
test (generic function with 1 method)

julia> a = Vector{Vector{Int}}(undef, 1)
1-element Vector{Vector{Int64}}:
 #undef

julia> test(a)
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getindex
   @ ./array.jl:924 [inlined]
 [2] test(a::Vector{Vector{Int64}})
   @ Main ./REPL[7]:2
 [3] top-level scope
   @ REPL[9]:1

You’d see two branches (and two throws) if there was a bounds check.

6 Likes

Thanks! And thanks for the answer!

So, the type of a is really something like Vector{Union{Vector{Int},undef}}
It almost seems to me then, that Vector{Vector{Int}} is just a poor version of Vector{Union{Vector{Int},Nothing}}.

Can I somehow tell Julia that no element in a is undef?

Currently I’m trying to read the LLVM code for a (larger) program. For this program the LLVM code is bloated with a lot of error handling, and I would like to get rid of the noise. Any suggestion on how to do this? Do I have to compile Julia myself with a modified arrayref?

No, undef is not a type at all. Vector{Int} itself is distantly allocated and accessed by a pointer; that pointer is what is stored in Vector{Vector{Int}}. For memory safety, you’d check if that pointer is pointing to a valid instance. I haven’t heard of an easy way to elide defined reference checks in the style of @inbounds because branching checks prevent SIMD, but you can’t SIMD references anyway.

1 Like

You can get rid of the checks by doing

function test(a::Vector{Vector{Int}})
    return unsafe_pointer_to_objref(unsafe_load(Ptr{Ptr{Int}}(pointer(a,1))))
end
a = [[1], [2]]
@code_llvm debuginfo=:none test(a)
define nonnull {}* @julia_test_295({}* noundef nonnull align 16 dereferenceable(40) %0) #0 {
top:
  %1 = bitcast {}* %0 to i64**
  %2 = load i64*, i64** %1, align 8
  %3 = load i64, i64* %2, align 1
  %4 = inttoptr i64 %3 to {}*
  ret {}* %4
}

Note that this will almost certainly segfault if looked at funny.

7 Likes

Hi! I know that undef is not a type. What I mean is that, from my reading of the type, I expected to be able to do a[1] to get an element of type Vector{Int}, and that this would never fail (unless 1 was out of bounds).

As @mbauman explained, this is wrong, because (at the user level) it is possible to create a vector a::Vector{Vector{Int}} where a[1] is not of type Vector{Int}, but instead of a special undef value (in this case reading the element throws an error).

Therefore, Vector{Vector{Int}} seems to me almost like Vector{Union{Vector{Int},Nothing}} with addition that an error is thrown when reading nothing.

Very good! Thank you!

It’s not a value, either, it’s really an absence of one. A value, or rather an instance, can be stored and passed into functions, and a type specifies its instances. Granted, undefined references aren’t a void in spacetime and are implemented by patterns of bits like anything else, but they are neither instances nor types on the language level, not even the instance nothing.

1 Like

On my phone right now, but an assume(isdefined would invoke fewer nasal demons and not break if anyone ever wants to do read barriers.

6 Likes

This intrigued me, so I looked up how to do it:

assume_unreachable() = Core.Intrinsics.llvmcall("unreachable", Cvoid, Tuple{})

function assume_condition(c::Bool)
  # This could also be implemented using LLVM's `llvm.assume`
  # intrinsic directly:
  # https://llvm.org/docs/LangRef.html#llvm-assume-intrinsic
  c || assume_unreachable()
  nothing
end

function test0(a)
  @inbounds a[1]
end

function test1(a)
  assume_condition(isassigned(a, 1))
  @inbounds a[1]
end
julia> a = [Int[]]
1-element Vector{Vector{Int64}}:
 []

julia> @code_llvm test0(a)
; Function Signature: test0(Array{Array{Int64, 1}, 1})
;  @ REPL[3]:1 within `test0`
define nonnull {}* @julia_test0_9569({}* noundef nonnull align 16 dereferenceable(40) %"a::Array") #0 {
top:
;  @ REPL[3]:2 within `test0`
; ┌ @ essentials.jl:13 within `getindex`
   %0 = bitcast {}* %"a::Array" to {}***
   %.data1 = load {}**, {}*** %0, align 8
   %.ref = load {}*, {}** %.data1, align 8
   %.not = icmp eq {}* %.ref, null
   br i1 %.not, label %fail, label %pass

fail:                                             ; preds = %top
   %jl_undefref_exception = load {}*, {}** @jl_undefref_exception, align 8
   call void @ijl_throw({}* %jl_undefref_exception)
   unreachable

pass:                                             ; preds = %top
; └
  ret {}* %.ref
}

julia> @code_llvm test1(a)
; Function Signature: test1(Array{Array{Int64, 1}, 1})
;  @ REPL[4]:1 within `test1`
define nonnull {}* @julia_test1_9742({}* noundef nonnull align 16 dereferenceable(40) %"a::Array") #0 {
top:
;  @ REPL[4]:2 within `test1`
; ┌ @ array.jl:268 within `isassigned`
; │┌ @ abstractarray.jl:684 within `checkbounds`
; ││┌ @ abstractarray.jl:386 within `eachindex`
; │││┌ @ abstractarray.jl:134 within `axes1`
; ││││┌ @ abstractarray.jl:98 within `axes`
; │││││┌ @ array.jl:191 within `size`
        %0 = bitcast {}* %"a::Array" to { i8*, i64, i16, i16, i32 }*
        %.length_ptr = getelementptr inbounds { i8*, i64, i16, i16, i32 }, { i8*, i64, i16, i16, i32 }* %0, i64 0, i32 1
        %.length = load i64, i64* %.length_ptr, align 8
; ││└└└└
; ││┌ @ abstractarray.jl:760 within `checkindex`
; │││┌ @ int.jl:513 within `<`
      %.not = icmp ne i64 %.length, 0
; │└└└
   call void @llvm.assume(i1 %.not)
; │ @ array.jl:270 within `isassigned`
   %1 = bitcast {}* %"a::Array" to {}***
   %.data5 = load {}**, {}*** %1, align 8
   %array_slot = load atomic {}*, {}** %.data5 unordered, align 8
; └
;  @ REPL[4]:3 within `test1`
  ret {}* %array_slot
}

I’m not sure whether test1, which uses llvm.assume, compiles to better code, though. For some reason there’s an atomic load, which seems suboptimal. Why is that? EDIT: actually, the atomic load seems to be part of isassigned? It’s a bit disappointing that the isassigned call isn’t optimized away, actually?

3 Likes

Is there really a concept of “absent values” in Julia?

Reading up on undef in Vector{T}(undef, n), it does not say that the elements are absent, it just says that the vector is not initialized (Arrays · The Julia Language). This aligns with my naive interpretation from experience in other languages.

For example

julia> a = Vector{Float64}(undef, 2)
2-element Vector{Float64}:
 1.141496569762e-311
 1.14149656977e-311

julia> a[1]
1.141496569762e-311

Here there is no check to test if a[1] is absent.

Also, that a vector of type Vector{Int} lives on the heap does not mean it can be “absent”. See for example

julia> f(a::Vector{Int}) = @inbounds a[1]
f (generic function with 1 method)

julia> a = [1]
1-element Vector{Int64}:
 1

julia> @code_llvm f(a)
;  @ REPL[19]:1 within `f`
; Function Attrs: uwtable
define i64 @julia_f_464({}* noundef nonnull align 16 dereferenceable(40) %0) #0 {
top:
; ┌ @ essentials.jl:13 within `getindex`
   %1 = bitcast {}* %0 to i64**
   %2 = load i64*, i64** %1, align 8
   %3 = load i64, i64* %2, align 8
; └
  ret i64 %3
}

There is no check to test if a is undefined or absent.

So, I still feel that the type Vector{Vector{Int}} is confusing. It more like Vector{Union{Vector{Int}, null}} (with the addition that accessing a null trows an error). And, since Julia cannot be sure that an element in a vector of type Vector{Vector{Int}} has type Vector{Int}, it uses runtime checks whenever an element is accessed.

I think it would be more natural to have the invariant that an element in a vector of type Vector{Vector{Int}} is always a Vector{Int} (banning construction like Vector{Vector{Int}}(undef, 2)). And also to guarantee that accessing an element in the vector will never throw an exception.

There’s nothing special about this type, except that the element type, Vector{Int}, is mutable. Values of mutable types are actually references, so, specifically, they may be undefined references. In my opinion, the Manual should be improved to explain this better. It’s currently discussed a tiny bit here and here.

No, the undefined reference is of known type, take this code, for example:

mutable struct M end

struct ContainsUndefinedRef
  v::M

  # `new()` is what's called *incomplete initialization*
  ContainsUndefinedRef() = new()
end

The field v is known to be of type M, but M is mutable, so it may just be an undefined reference. Due to the ContainsUndefinedRef constructor, the field will contain an undefined reference when the value is first constructed. So code like ContainsUndefinedRef().v will throw.

the problem with removing runtime checks is that you will then default it you try to access undef elements (since you are dereferencing a null pointer)

1 Like

Yes.

That’s because isbits instances have no pointers to check. Current implementation of uninitialized elements of such types e.g. Int is just the preexisting bits in memory, so indexing undefined elements does result in an instance. However, doing so is undefined behavior, not something the language guarantees and not something with any practical use. It’s unlikely to change, it’s just cheaper this way.

Could say the same for @inbounds, though there is a clearer performance benefit in that case. Would eliding checks for a defined reference make any difference next to the dereferencing?

Since this topic seems to move towards general discussion of undefined references, I think this talk by Tony Hoare will be quite interesting to folks involved:

2 Likes

This does not seem right to me. See the example I gave in the post you are replying to. If you have some vector a::Vector{Int}, then you know this is not undefined (even though you only hold a reference). When passing a to a function, the function does not have to check if it is undefined before accessing it. On the other hand, if you attempt to dig something of type Vector{Int} up from within some other object, then this may fail (in which case Julia throws an error).

So, something of type Vector{Int} is actually a vector, it can not be undefined. And if a::Vector{Vector{Int}} is of positive length, then a[1] is also always a vector – unless Julia threw an error.

I don’t like exceptions, and I think its icky that I have to worry about exceptions from something as basic as reading an element in a vector. It would be nice with a “safe array” type for arrays that are fully initialized.

Thanks for the pointers to the manual!
(I’m not trusted to post too many links, so had to edit the quote).

Can I read about it somewhere?

If accessing an undef (“absent”) element is supposed to result in an exception, then this should of course be done also in the inplace storage case (Julia would maintain some additional data about absentness of the elements in the vector).

Conversely, if accessing an undef element is “undefined behavior”, then why does it have to throw in the non-inplace case? I would prefer it to segfault.

1 Like

I agree that having the possibility that an array entry or field is undefined isn’t the nicest. However, you need a way to allocate an array or object and initialize it later. How would you propose doing that? Keep in mind that you need something that is reasonably simple, general and flexible.

The approach we went with was to allow creating an uninitialized array or object but then very carefully not make the undefined value first class—any access to it is an immediate error and you cannot make something undefined after it has been defined. I’ve never seen the possibility of undef entries in a reference array end up being a performance issue. Yes, you have to do a null check, but that’s a very predictable branch and gets dwarfed by the indirect memory load that follows when you access the referenced object.

I’m curious what alternative approach you might have in mind if any.

7 Likes

It isn’t undefined behavior, it’s fully defined. The behavior is:

  • For reference types, an uninitialized array element or field is undef which is an error to access in any way aside from checking if it is undef or not.
  • For value types, the array or field will contain arbitrary data interpreted as that type. This should not be used and may have values that cannot be constructed normally.

This is not like C “undefined behavior” at all. In particular, arbitrariness doesn’t make it undefined. C’s malloc function behaves exactly like this and certainly isn’t undefined.

then why does it have to throw in the non-inplace case?

In the reference case what value would you have it give you a reference to? Is there some Vector{Int} value it would be reasonable to fill every Vector{Vector{Int}} with before it is initialized? What special value should we use for arbitrary user-defined types? Should we require every type to provide a method to get a dummy value?

I would prefer it to segfault.

You can consider UndefRefError to be a segfault.

2 Likes

This came up before, with (due to your statement) now conflicting interpretations:

Which is it?

To be clear, I don’t think the term “undefined behavior” is particularly helpful in Julia, because we don’t have a defining standard like C that we could be violating.

2 Likes