What does #undef mean?

try to use

julia> Dict(:x=>1,2=>3,0=>1).keys
16-element Vector{Any}:
   0
 #undef
 #undef
 #undef
   2
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
    :x

julia> typeof(ans)
Vector{Any} (alias for Array{Any, 1})

julia> Dict(:x=>1,2=>3,0=>1).keys[1]
0

julia> Dict(:x=>1,2=>3,0=>1).keys[2]
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getindex(A::Vector{Any}, i1::Int64)
   @ Base ./array.jl:924
 [2] top-level scope
   @ REPL[3]:1

I guess this vector is a hash key but I don’t figure out what happen on #undef. I thought it would be missing or nothing

  1. what does the actual meaning of #undef ?
  2. Can I define a vector like Dict(:x=>1,2=>3,0=>1).keys directly?
  3. Can I define an isolated #undef?
  4. what is the best modification of below code? I thought the code has semantic significant, but it throw error because of #undef. Will Julialang adapt this situation in future?
julia> for i in Dict(:x=>1,2=>3,0=>1).keys 
           print(i)
       end
0ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getindex
   @ ./array.jl:924 [inlined]
 [2] iterate
   @ ./array.jl:898 [inlined]
 [3] top-level scope
   @ ./REPL[9]:3

Lots to unpack here, let’s start with 1 & 2:

#undef simply stands for a reference that hasn’t been specified yet. See the following example:

julia> x = Vector{Int}(undef, 3)
3-element Vector{Int64}:
 139777823454816
 139777979780016
               0

julia> y = Vector{String}(undef, 3)
3-element Vector{String}:
 #undef
 #undef
 #undef

Initializing a vector with undef means that we don’t specify its content yet.

  • In the case of an Int eltype, the vector knows the size (in bytes) of its elements and “reserves” it, but since we haven’t written anything it still contains whatever was there in memory before. Hence the seemingly random numbers, which may be different if i do it again.
  • But in the case of a String eltype, the vector doesn’t know the size of its elements, so it just stores an undef for each one.
3 Likes

Regarding 4, it is generally considered bad practice to access struct fields directly. Especially in the standard library, there are usually “getters” / “setters” that can help you do that cleanly. And sure enough:

julia> for i in keys(Dict(:x=>1,2=>3,0=>1)) 
           println(i)
       end
0
2
x

The reason why your 3-element Dict already has a vector of keys of length 16 probably has to do with efficiency: to achieve an amortized cost of O(1), you don’t want to extend your storage every time you insert a new element, so instead you plan for slightly larger than you need.

4 Likes

Finally, I’m not sure about 3, I’ll let more qualified people answer

Going a stage deeper, “undefined memory” is not the same as “memory with no defined value”.

1 Like

Thanks, that’s what I meant with “more qualified people” :wink:

1 Like

I think I fall into the “people who remember reading something once and just searched for it again” bracket

5 Likes

Thanks for your answer.

I regarded #undef as a memory area that null pointer points in C and Dict(:x=>1,2=>3,0=>1).keys above is a null pointer. Is it right?
I have wrong expression before. In fact I want to ask “Can we define a variable which has value #undef?”.

I guess it can’t define an isolated one. But Dict(:x=>1,2=>3,0=>1).keys is indeed a variable which has value #undef, or a pointer(or a reference).

I understand now. Any name which I didn’t assign is the isolated one I said, like

julia> a
ERROR: UndefVarError: a not defined

The value of this a is #undef.

Well, no. #undef is some marker showing Julia runtime that an access to an array element or a struct field is invalid. It is not a value because you cannot assign it to a variable, check its type etc.

In terms of your specific case, you shouldn’t use Dict(...).keys. Think of it as of an undocumented feature that can change in any version however the devs like. The proper API to access dictionaries is via functions: keys(dict), values(dict), dict[:x] (which is internally translated into a function call getindex(dict, :x)) etc.

1 Like

This means that the variable itself is not defined, which is different from it being defined and having value #undef.

2 Likes

yes, I have same paraphrase. #undef is not a value but a placeholder. It means this place (a slot of array or a struct) is not defined. That’s why I said a name like a which didn’t assign is a similar one.

Maybe there are little different between UndefRefError and UndefVarError but I think it’s not important. in u=Vector(undef, 2), u[1] itself is also not define. The Ref just means the vector u have a slot at this place.

if we define u=Vector(undef, 2) we have defined u but didn’t defined u[1] and u[2]. However, to show a vector like

2-element Vector{Int64}:
xxx
xxx

we must have a placeholder at xxx, so there is a #undef. Even through u[1] is undefined, it has a relation with a defined vector u so that the error is UndefRefError .

I don’t think I fully agree. With UndefRef, u or u[1] are both defined, but the values they refer to are not. With UndefVar it is the name (not the value) which is undef.

There’s a difference between an name and a value that you are not completely catching onto. (Leaving aside that u[1] syntax and not just an identifier).

3 Likes

Sorry for my late reply. I might understand and agree what you said, but there are something I want to tell.

Let’s clear some different concepts in my view:

  • value: some data saved in memory
  • pointer(in C): the address of memory about the value
  • reference: an abstraction of pointer
  • name: a symbol(or a string or an identifier), binding to a reference

It’s a little ambiguous that we say

There are different levels. For a, it’s totally undefined. For u, it certainly defined on every level.

For u[1], it doesn’t have value and we regarded it as a syntax so it doesn’t have name. By error message undefined reference undefined reference and julia source code, it doesn’t have reference and pointer pointed to u[1].

STATIC_INLINE jl_value_t *undefref_check(jl_datatype_t *dt, jl_value_t *v) JL_NOTSAFEPOINT
{
     if (dt->layout->first_ptr >= 0) {
        jl_value_t *nullp = ((jl_value_t**)v)[dt->layout->first_ptr];
        if (__unlikely(nullp == NULL))
            return NULL;
    }
    return v;
}

That’s why I said u[0] is similar to a. Of cause, it’s just a discuss by meaning. I know it has different on semantic.