What about an `undefs` function in `Base`?

It would be nice to have an undefs function that operates the same way as zeros and ones. The syntax for initializing an array of zeros or ones is more concise than initializing an array of undefs but is not as efficient. I know use cases can be different but I’m simply taking about initializing an array that will be filled in it’s entirety later on. zeros and ones will be much more intuitive to new users that don’t fully understand Julia’s type declaration. For this reason I suspect many novice Julia users use zeros and ones when they should really be reserving an array of undefs. To rectify this I propose adding an undefs function in Base unless there is a good reason not to.

e.g.

zeros(Float32,10,10)
ones(Int8,10,10)

# vs
Array{Int16}(undef,10,10)
6 Likes

Here’s some discussion from 2020: Simpler syntax for creating uninitialized arrays · Issue #34775 · JuliaLang/julia · GitHub

Basically, it was considered, but there wasn’t consensus to put it in (the way of many things in a carefully-designed language).

4 Likes

OK, it looks like this has been discussed at length… and there was even a PR for this. We should let sleeping dogs lie.

Maybe so. But after having quick-read the PR, I still don’t see many (good) arguments against it. It seems that the PR has been closed (by the author himself) mainly because he realized that a calloc-based zeros could make an undefs function obsolete. Even if one agrees with this, we still aren’t using calloc for zeros (see this PR). I know that @mkitti (the author of the PR) has in the mean time created ArrayAllocators.jl which provides a calloc interface. But I don’t see that the question asking for an undefs in Base has been resolved in this PR.
The linked discourse thread has a few more arguments but also doesn’t really seem to have come to a conclusion (but rather just ended after a few exchanges).

FWIW, personally, I’m with @StefanKarpinski on this (Simpler syntax for creating uninitialized arrays · Issue #34775 · JuliaLang/julia · GitHub):

One thing we could do is:

  • make Array{T}(zeros, dims...) etc. work
  • make undef(T, dims...) and undef(dims...) work

That way we round out the collection of convenience constructors in way that can always be expressed in terms of the fuller Container{Eltype}(initializer, dims...) form.

8 Likes

I believe that there’s an argument that novice users should probably not use undef arrays, that it is an advanced concept used for achieving micro optimization, and easy to misuse. Not sure if that’s right or not, but in many cases you can use the very convenient similar function.

2 Likes

I’m not sure I agree with those arguments. undef is a more advanced concept but for people new to Julia coming from other languages they will likely have a clear understanding of per-allocating and an undefs function would be no more dangerous than similar but would have the same syntax as ones and zeros that are heavily used for per-allocating. It only takes using undefs or similar once to realize the dangers. Also, depending on ones workflow (i.e. large arrays) I would argue that using undefs in place of zeros or ones can result in significant speedup. Having new users come and immediately discover speedups through using of undefs and become accustom to working with undef arrays would be a good thing.


julia> @btime zeros(10000,10000)
  186.795 ms (2 allocations: 762.94 MiB)
10000×10000 Matrix{Float64}

julia> @btime Array{Float64}(undef,10000,10000)
  6.972 μs (2 allocations: 762.94 MiB)
10000×10000 Matrix{Float64}
2 Likes

Although I’d argue if those speedups make a noticeable difference to the runtime of your program you should probably reconsider your algorithm and think about reusing buffers (which is probably why @DNF calls it a micro optimization - if you’re calling zeros often enough for that millisecond difference to matter there’s likely a bigger issue with the algorithm)

1 Like

One thing is that undefs might lie to you, and not allocate the memory at all until you try and write to it.

3 Likes

Fair point re. speed. I think using undef is still best practice in many cases.

I would also advocate for undefs. I am teaching scientific computing in Julia at university level and I prefer to familiarize student with some basic optimization concepts. I find it strange that language which considers itself to be designed for it makes it more obtuse obscure to initialize array and do nothing then initialize array and fill it. It is not a big deal, but it’s awkward.

But it is very straightforward to do today:

Vector{T}(undef, n)
similar(X)

It is not an undue burden to learn this, I think.

2 Likes

It is more obtuse obscure than zeros or ones, that is a point which is being made.

2 Likes

Yes, but only a bit. And it is also a more obscure subject (I presume you meant ‘obscure’).

1 Like

BTW, I tried to make an implementation. It’s type piracy, not sure if there are any possible repercussions to using it:

(::UndefInitializer)(::Type{T}, siz::NTuple{N, Integer}) where {T, N} = Array{T, N}(undef, siz)
(::UndefInitializer)(::Type{T}, siz::Integer...) where {T} = undef(T, siz)
(::UndefInitializer)(siz...) = undef(Float64, siz...)

Examples:

julia> undef(Int, (2,3))
2×3 Matrix{Int64}:
 1  0  0
 1  0  0

julia> undef(Int, 2,3)
2×3 Matrix{Int64}:
 2688559218696  2688559218696  2688559218696
 2688559218696  2688559218696  2688559218928

julia> undef(2,3)
2×3 Matrix{Float64}:
 0.0  0.0  0.0
 0.0  0.0  0.0

julia> undef(String, 2,3)
2×3 Matrix{String}:
 #undef  #undef  #undef
 #undef  #undef  #undef

This seems a bit odd, perhaps:

julia> undef()
0-dimensional Array{Float64, 0}:
0.0

It makes sense, but it’s not completely obvious that the default type should be Float64, unlike with zeros and ones.

4 Likes

I think the function name should be undefs as it fits with the zeros and ones family. This would also avoid Type piracy.

4 Likes

Why is that? If I want to iterate over an array and store the results in the same sized array, what is the best practice?

do_stuff(val) = string(val)
input_vec = [1,2,3]
# Broadcast obviously works for such a small thing
output_vec = do_stuff.(input_array)

# But for more complicated processes, the best paradigm I know is
output_array = Vector{String}(undef, length(output_array))
for i in eachindex(input_array)
    #more logic
    output_array[i] = do_stuff(input_array[i])
end

What should I do besides this?

There’s nothing wrong with your code, as far as I can tell. Alternatively, you could use map() do or make a more complicated function to broadcast.

I have nothing against people using undef, I also do.

1 Like

If there’s going to be a new name, I think I would prefer allocate, or something like that.

Perhaps things would be more neat and tidy if there were just allocate and fill. Then there wouldn’t be the confusion between zeros and zero, ones and one.

4 Likes

My personal preference would be to use undefs as the ones and zeros convention has already been decided on. The further we get from undefs the less useful it will be over already existing syntax for defining undef arrays

1 Like

One problem that I haven’t seen mentioned here with the suggested undefs is that it does not generalize beyond Array. zeros and ones have the same problem and before Julia 1.0 there were even plans to remove zeros and ones (see e.g. JuliaLang/julia#24444, JuliaLang/julia#25507) rather than adding more similar functions.

Julia and packages defined hundreds of different array types and there is no consistent way to initialize them, not even for getting a zero-array. For example, given FooArray, how do you instantiate one filled with zeros? A good guess is probably something like FooArray(zeros(n)), but i) that doesn’t always work, and ii) it also feels wrong to have to construct an Array first. It is true that in many cases there will be an Array backing the FooArray, but that should then IMO be allocated and managed internally and hidden from the user. StaticArrays.jl have “solved” this with macros (@SVector zeros(3) etc) which doesn’t actually call the Array-specific zeros function, but just like zeros this does not generalize.

The current Array{T}(undef, n) implementation came from the idea of having a general array “constructor interface” of the form ArrayType{T}(initializer, size, ...), but only undef, nothing and missing exist as initializers now. There were attempts to also have Array{T}(zeros, size) and Array{T}(one(s), size) as replacements or generalizations of zeros and ones, see JuliaLang/julia#24389.

Much discussion around this can be found in JuliaLang/julia#24595 and linked issues/PRs.

6 Likes