You can subset an uninitialized vector, e.g.:
julia> v = Vector{Any}(undef, 4)
4-element Array{Any,1}:
#undef
#undef
#undef
#undef
julia> v[:]
4-element Array{Any,1}:
#undef
#undef
#undef
#undef
But not a matrix:
julia> m = Matrix{Any}(undef, 2 , 2)
2×2 Array{Any,2}:
#undef #undef
#undef #undef
julia> m[:,:]
ERROR: UndefRefError: access to undefined reference
But you can do it if you drop a dimension:
julia> m[:]
4-element Array{Any,1}:
#undef
#undef
#undef
#undef
My question is whether this is intended and if not should it be fixed? Thank you.
4 Likes
Oh interesting. The reason for the two behaviors is because there are two implementations:
- An optimized implementation just for
Array
that avoids the scalar getindex and just copies the chunk of memory
- The generic implementation for all
AbstractArray
that just uses repeated scalar getindex to move elements to a new array one at a time.
When you hit (1) you can effectively select #undef
elements but when those methods don’t exist then you hit (2) and end up dereferencing an #undef
.
An even more compelling set of examples is:
julia> v = Vector{Any}(undef, 4);
julia> v[1:2]
2-element Array{Any,1}:
#undef
#undef
julia> v[[1,2]]
ERROR: UndefRefError: access to undefined reference
Stacktrace:
[1] getindex at ./array.jl:728 [inlined]
[2] macro expansion at ./multidimensional.jl:699 [inlined]
# ...
julia> v[1:1:2]
ERROR: UndefRefError: access to undefined reference
Stacktrace:
[1] getindex at ./array.jl:728 [inlined]
[2] getindex(::Array{Any,1}, ::StepRange{Int64,Int64}) at ./array.jl:753
[3] top-level scope at REPL[19]:1
I suppose those optimizations could/should check to see if the elements are defined — or perhaps only apply for concrete element types — but unfortunately that would then destroy the optimization!
1 Like
I would rather expect the default (unoptimized) method to check if the value is defined and only copy it when it is defined.
We don’t have an efficient way to generically ask if a given index is #undef
, unfortunately. The current implementation leaves a bit to be desired…
3 Likes
I think that for #undef
, it is perfectly fine to have the consequences of access undefined, which would allow the current status quo or various similar optimizations in the future.
IMO the first thing you do with an array of #undef
is to fill it, before anything else. They should not be passed around uninitialized to functions where elements can potentially be accessed. If initialization is deferred for some reason, Union{Nothing,...}
is usually a much better choice.
1 Like
I really appreciate that @bkamins goes through and finds behaviors like this with such a fine tooth comb. It’s typically not because that’s what he does or because he wants some behavior, but rather it’s because he’s working on DataFrames’ behaviors and is very carefully considering their behaviors and how they match base Julia’s.
4 Likes
Yes, I also like learning about these corner cases. For DataFrame
, I would suggest that either
-
consequences are undefined when the arguments it was intialized with #undef
in them,
-
test for this in the constructor (for concrete bits types, isassigned
is a no-op and thus costless).
(1) is fast and consistent with the rest of Julia, (2) is excessively user-friendly.
2 Likes