Subsetting uninitialized arrays

You can subset an uninitialized vector, e.g.:

julia> v = Vector{Any}(undef, 4)
4-element Array{Any,1}:
 #undef
 #undef
 #undef
 #undef

julia> v[:]
4-element Array{Any,1}:
 #undef
 #undef
 #undef
 #undef

But not a matrix:

julia> m = Matrix{Any}(undef, 2 , 2)
2×2 Array{Any,2}:
 #undef  #undef
 #undef  #undef

julia> m[:,:]
ERROR: UndefRefError: access to undefined reference

But you can do it if you drop a dimension:

julia> m[:]
4-element Array{Any,1}:
 #undef
 #undef
 #undef
 #undef

My question is whether this is intended and if not should it be fixed? Thank you.

4 Likes

Oh interesting. The reason for the two behaviors is because there are two implementations:

  1. An optimized implementation just for Array that avoids the scalar getindex and just copies the chunk of memory
  2. The generic implementation for all AbstractArray that just uses repeated scalar getindex to move elements to a new array one at a time.

When you hit (1) you can effectively select #undef elements but when those methods don’t exist then you hit (2) and end up dereferencing an #undef.

An even more compelling set of examples is:

julia> v = Vector{Any}(undef, 4);

julia> v[1:2]
2-element Array{Any,1}:
 #undef
 #undef

julia> v[[1,2]]
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getindex at ./array.jl:728 [inlined]
 [2] macro expansion at ./multidimensional.jl:699 [inlined]
 # ...

julia> v[1:1:2]
ERROR: UndefRefError: access to undefined reference
Stacktrace:
 [1] getindex at ./array.jl:728 [inlined]
 [2] getindex(::Array{Any,1}, ::StepRange{Int64,Int64}) at ./array.jl:753
 [3] top-level scope at REPL[19]:1

I suppose those optimizations could/should check to see if the elements are defined — or perhaps only apply for concrete element types — but unfortunately that would then destroy the optimization!

1 Like

I would rather expect the default (unoptimized) method to check if the value is defined and only copy it when it is defined.

We don’t have an efficient way to generically ask if a given index is #undef, unfortunately. The current implementation leaves a bit to be desired

3 Likes

Thank you.

I think that for #undef, it is perfectly fine to have the consequences of access undefined, which would allow the current status quo or various similar optimizations in the future.

IMO the first thing you do with an array of #undef is to fill it, before anything else. They should not be passed around uninitialized to functions where elements can potentially be accessed. If initialization is deferred for some reason, Union{Nothing,...} is usually a much better choice.

1 Like

I really appreciate that @bkamins goes through and finds behaviors like this with such a fine tooth comb. It’s typically not because that’s what he does or because he wants some behavior, but rather it’s because he’s working on DataFrames’ behaviors and is very carefully considering their behaviors and how they match base Julia’s.

4 Likes

Yes, I also like learning about these corner cases. For DataFrame, I would suggest that either

  1. consequences are undefined when the arguments it was intialized with #undef in them,

  2. test for this in the constructor (for concrete bits types, isassigned is a no-op and thus costless).

(1) is fast and consistent with the rest of Julia, (2) is excessively user-friendly.

2 Likes