Is accessing an `undef` array undefined behavior?

From the Rust vs Julia in scientific computation blog post:

v = Vec{Int64}(undef, 3)

What would happen if you forget to set an undef (undefined) value and read it by mistake? One possible output of v is:

139974349418624
139974300721328
2181038337
Welcome to the area of undefined behavior with uninitialized data :scream:

Is this undefined behavior? As far as I can tell, udefined behavior means the compiler is allowed to do anything. Not so here. Accessing v must return a valid Int64, the vector simply contains arbitrary valid numbers.

2 Likes

Maybe youā€™re thinking of unspecified behavior there? Although I canā€™t quickly find a place in the docs that says ā€œundefined behaviorā€ verbatim, it is called undefined content, and in other languages, indexing an uninitialized array is considered undefined behavior because it depends on fundamentally unpredictable RAM state, not program state like RNGs do.

In any case, it wouldnā€™t happen in good practice, the point of undef is to save work before initializing the data right afterward prior to access. I suggested 0-length vectors and sizehint! for interactive contexts where there isnā€™t source code to be written properly before evaluation.

No, Iā€™m thinking of undefined behavior. It has a specific definition (look at the wikioedia link in the blog post). This does not seem to fit the definition.

I have to jump in the car nowā€¦

(If indexing into an undef array was undefined behavior, then the compiler would be allowed to send an email or play a tune, or ā€˜make demons come out of your noseā€™). In fact, it will return a valid integer.

2 Likes

Reading uninitialized values is undefined behavior because you can not predict what the value is going to be as @Benny explains:

If you for example read an uninitialized value and use it for indexing a big array, then some of the times you will get unpredictable wrong results because you probably wanted another index. And sometimes, the program crashes because the value that you did read was an invalid index.

To make it more concrete, imagine having a Vector with length >= 255. Now you use an uninitialized UInt8 value to index that vector. In 1 out of 256 possible cases, your program will crash because you index with 0 (in Julia).

Such problems are very hard to debug because they are mostly silent producer of random, incorrect results and sometimes they are even hidden ā€œtime bombsā€.

1 Like

And I adressed this. It does not seem to fit the definition of undefined behavior, which includes

In the C community, undefined behavior may be humorously referred to as "nasal demons ", after a comp.std.c post that explained undefined behavior as allowing the compiler to do anything it chooses, even ā€œto make demons fly out of your noseā€.

You similarly cannot predict what the value will be if you generate a random value, or read the time from a clock.

6 Likes

This is not true. If this was true, getting keyboard input from the user or reading from a file from the disk would also be ā€œundefinedā€ behavior because the contents are unpredictable and outside the scope of the program.

Undefined behavior means the compiler is free to replace an action with any other action it chooses. This is useful because a ā€œproperā€ program should never run into the situation, so the transformations should be harmless and can often benefit other parts of the code. Reading uninitialized (or external) values leads to unpredictable behavior, including producing segfaults or other errors, but it need-not be undefined from the compilerā€™s perspective.

EDIT: the above poster beat me to it.

5 Likes

@DNF @mikmoore We agree on the possible consequences of reading an uninitialized value. We just donā€™t agree on how we call it and I think that it is a definition problem. Rust and C++ call it undefined behavior to read an uninitialized value.

Rust reference:
MaybeUninit in std::mem - Rust (in the second example)

C++ reference:
Undefined behavior - cppreference.com (under ā€œuninitialized scalarā€)

2 Likes

yes, but Julia does not, because in Julia it is not undefined behavior even if in Rust/C++ it is.

ā€œundefined behaviorā€ is not defined by ā€œthe list of things that cause undefined behavior in languages X and Yā€

I could write a new language FooLang in which integer overflow is UB; that wouldnā€™t make overflow UB in other existing languages

1 Like

Uninitialized scalar value, it seems. The values inside an undef Array of Ints or Floats (perhaps all bitstypes perhaps?), arenā€™t really undefined, they are just ā€˜randomā€™, in the sense that they contain arbitrary values. Thatā€™s different.

Itā€™s not just nitpicking, accusing Julia of being sloppy about undefined behavior sounds sinister and dangerous, and gives an impression that itā€™s much worse than the reality.

4 Likes

ā€œundefined behaviorā€ in C is quite a different thing from what people in this thread seem to think. C is a language with a standard, which defines certain things to behave in a certain way. In the particular case of reading from some allocated memory, as far as Iā€™m aware, the standard only defines what should happen for initialized memory. It doesnā€™t say anything about what should happen if the memory is uninitialized, and hence these sorts of things are commonly referred to as ā€œundefined behaviorā€ (because itā€™s not defined in the C standard).

To be more accurate, it ought to be referred to as ā€œinplementation (of the C compiler) dependent behaviorā€, because any given compiler is free to define any semantics for ā€œundefined behaviorā€ it wants, because any resulting implementation still conforms to the C standard.

In the terms of julia, this doesnā€™t really mean much at all - julia doesnā€™t have a standard, so there isnā€™t any defined behavior in the first place - in a sense, the implementation of the julia compiler currently in the repo on github is the standard.

What does this imply for undef and arrays ā€œinitializedā€ to that? Well, thereā€™s multiple points there, depending on the element type of the array - for mutable/non-isbits types, you get an array filled with the special #undef values, because you need some value to signify that there isnā€™t any valid instance at that index in the array (yet). The language also actually prevents you from trying to retrieve such an #undef ā€œvalueā€ - it must, because there isnā€™t anything there to retrieve.

For immutable isbitstypes, the story is a bit different, because they donā€™t have any pointers inside of them - they are solely defined by their bitpattern. For some types, e.g. Int or Float64, all possible bitpatterns have a valid interpretation. For some others, e.g. a type wrapping an even integer, not all bitpatterns are valid instances, but which exactly are valid depends on the type in question. Generally, it would be preferrable to always construct arrays with valid instances, but that either means having a known sentinel value for each type that can be used for this (also known as the ā€œbillion dollar mistakeā€), force users to allocate arrays exclusively via something like fill, taking a function that constructs each instance, or accept that Vector{T}(undef, ...) is a low level tool thatā€™s a bit dangerous to use. Whether reading from such an array is undefined behavior or not is beside the point - you get some bitpattern back, thereā€™s just no guarantee that it observed the checks that might have been used in a proper construction of objects of that type. So from my POV, if anything, assuming that the objects of an uninitialized array are valid instances and that you can use them is unsafe (and perhaps undefined in the context of that type - but certainly not undefined in terms of julia as a whole, because you do get some bitpattern back, itā€™s just not guaranteed to be valid for your type).


Most importantly though, no matter if this instance is ā€œundefined behaviorā€ or not - the compiler is certainly not allowed to do whatever it wants when it encounters ā€œundefined behaviorā€ (insofar that even exists in julia). That would be quite a bit crazy.

11 Likes

You beat me to it, so hereā€™s just the other Rustinomonicon link that spells it out, short enough Iā€™ll repost:

All runtime-allocated memory in a Rust program begins its life as uninitialized. In this state the value of the memory is an indeterminate pile of bits that may or may not even reflect a valid state for the type that is supposed to inhabit that location of memory. Attempting to interpret this memory as a value of any type will cause Undefined Behavior. Do Not Do This. Rust provides mechanisms to work with uninitialized memory in checked (safe) and unchecked (unsafe) ways.

So it seems like Rust is considering this undefined behavior, so I think itā€™s fair that Mo8it was referring to it as so to Rustaceans on a Rust blog. The ā€œmay not reflect a valid stateā€ is particularly interesting because Sukera cautioned me just yesterday about reinterpreting an array of bytes (with the proper field padding) to a structure; it does not go through any constructor method that may have checks for valid fields, so arguably it doesnā€™t return an instance of the type. Simple example:

julia> struct NeverZero
         x::Int
         NeverZero(x) = x == 0 ? new(1) : new(x)
       end

julia> NeverZero(0), NeverZero(-7)
(NeverZero(1), NeverZero(-7))

julia> Vector{NeverZero}(undef, 2)
2-element Vector{NeverZero}:
 NeverZero(0)
 NeverZero(0)

Unsafe, definitely. Undefined, well the arguments here are interesting. I think thereā€™s too much stock being put into ā€œnasal demonsā€, that phrase is rooted in a joke; undefined behavior doesnā€™t have to be inconsistent on the same machine or as wild as sending spam emails. So the question is if Juliaā€™s standard defines this behavior. This section Incomplete Initialization addresses the issue most fully as far as I know, hereā€™s an excerpt:

However, not all object fields are references. Julia considers some types to be ā€œplain dataā€, meaning all of their data is self-contained and does not reference other objects. The plain data types consist of primitive types (e.g. Int ) and immutable structs of other plain data types. The initial contents of a plain data type is undefined: ā€¦ Arrays of plain data types exhibit the same behavior.

I believe this is a deliberate suggestion that the language standard does not guarantee what accessing uninitialized (isbits) ā€œplain dataā€ will do, itā€™s undefined content and behavior. The provided code example just happens to show the behavior of a current implementation is an unpredictable, possibly invalid value conforming to the structure; I donā€™t expect it to always be this way, a handful of other code examples show implementation-dependent details, like what typeof(1//2) is. Iā€™m only guessing, but this could also be why the name undef was chosen to begin with.

4 Likes

I donā€™t think itā€™s that crazy.
Code relying on the values of uninitialized memory is almost certainly wrong code.
The advantage to calling it ā€œundefined behaviorā€ is that then you can legally do anything, even throw an error with a helpful message if someone started Julia in a debug mode.
Because itā€™s undefined behavior to do this, or to overflow signed integers, compilers and sanitizers can actually make some types of problems easier to find.

In Julia, signed integer overflow is defined, so we could not validly make it throw an error, even in a debug mode.

3 Likes

ā€œUndefined behaviourā€ means violating the assumptions of the compiler (or the language in general). Like with most assumptions in that programmers make, when you violate them, there is no guarantees about how the program behaves.

An example: How does Base.searchsortedlast behave when the array is not sorted? Does it throw an error? Does it loop indefinitely? Who knows! This is not specified, this is not tested, and this has not been a design criterion when the code was created. This is undefined behaviour.

There are ways to violate the assumptions of the Julia compiler, i.e. to trigger undefined behaviour in Julia. The only ones that comes to mind are:

  • Accessing out-of-bounds memory by misusing @inbounds
  • Misusing Base.@assume_effects.
  • Misusing unsafe_* functions like unsafe_load

Accessing unintialized memory is not undefined behaviour, because the compiler has taken it into account: It is designed to take this situation into account.
Julia has very little undefined behavior - the only examples I can think of requires the user to explicitly opt-in to dangerous behaviour.

The original posts refer to how Rust is better because it doesnā€™t expose undefined behaviour. This is wrong. Like Julia, Rust allows users to opt-in to code that can lead to undefined behaviour. It does this using unsafe.

You can still violate assumptions about user code or libraries - in both Rust and Julia. Similarly, you can still access variables in both languages that have been initialized with placeholder values. In neither language, you can access truly uninitialized values.

A bigger question is: Is undefined behaviour that bad? Of course itā€™s worse than correctly-behaving code. But is it worse than other kinds of bug? In my view, itā€™s particularly bad because itā€™s a kind of bug that can lead to silent miscompilations and therefore silently wrong answers. However, logic bugs, including accessing placeholder values are worse, because they more reliably produces silently wrong answers. Therefore, I would generally take a program that reads OOB over a program that reads the wrong value.

1 Like

Right, the question from a langauge theoretic perspective though is what exactly should be done? If we take the interpretation that a compiler can ā€œacceptā€ or ā€œrejectā€ programs, if it encounters undefined behavior and exploiting that for some other behavior or optimization, it means ā€œacceptingā€ the program as valid under the language, while throwing a (compile time) error is ā€œrejectingā€ the program as invalid for the language the compiler (attempts to) implement. Whether thatā€™s a false positive or a false negative is the undefined part. No compiler/interpreter is going to get that right 100% of the time, but Iā€™d argue that really a good compiler ought to reject as many of these programs exhibiting ā€œundefined behaviorā€ as possible.

However, that doesnā€™t give it license to do ā€œanything it wantsā€. The compiler encountering ā€œundefined behaviorā€ and deciding to format all of my harddrives is pretty obviously a malicious compiler that does things not related to the task of accepting or rejecting the language at all (keep in mind that a language defining some term to mean ā€œdelete all harddrivesā€ and a compiler executing that is perfectly fine - that is defined behavior after all!). Thatā€™s the part Iā€™m referring to as ā€œquite a bit crazyā€.

1 Like

Iā€™d be wary of such claims - there are lots of what we call ā€œunderspecified/underdocumented functionsā€ in Base, like the behavior of sin(Inf). In terms of ā€œwhat has been definedā€, this is also undefined behavior (though the error currently resulting from that may not be unexpected at all).

Of course, that technically means that any unspecified/undocumented behavior in any language is undefined behavior, but thatā€™s not very useful of a definition when applied to too-large concepts :wink:

2 Likes

You are right, but in one note in the ā€œFearless concurrencyā€ section I clarify:

This blog post is about safe Rust, which means Rust without using unsafe.

Yes, but in your blog, you claim that reading values that are filled with placeholder values are undefined behaviour. What Iā€™m saying is that

  • What you say is undefined behaviour is not. This is not a semantic debate, by the way. You get well-specified behaviour when accessing uninitialized values in Julia: If itā€™s a bitstype you get a value of the right type with arbitrary bit content. If itā€™s not a bitstype, Julia throws an exception.

  • Undefined behaviour is accessible in both Rust and Julia by opting into it explicitly

Rust has way, way more way of running into undefined behaviour than Julia, because itā€™s a lower-level language. On the other hand, it has a single unsafe keyword you can grep for in code.

2 Likes

As I linked and quoted earlier, the docs says the content is undefined, so it seems like the opposite of promising a value. It does however say that reading uninitialized references throws errors.

1 Like

This is a very interesting discussion!

I did add the following note to the blog post:

I call it undefined behavior from the Rust point of view. There is a discussion about whether Julia would consider it undefined. However, that does not change the consequences of reading uninitialized memory.

4 Likes

Iā€™m curious, in what way do you think the consequences of reading Julia-style undef values are worse than reading Rust-style uninitialized values, i.e. values initialized with a placeholder value? In my view, the latter is much worse because it leads to hard-to-detect correctness bugs that the language or its tooling canā€™t help you deal with.