It would be nice to have an undefs function that operates the same way as zeros and ones. The syntax for initializing an array of zeros or ones is more concise than initializing an array of undefs but is not as efficient. I know use cases can be different but I’m simply taking about initializing an array that will be filled in it’s entirety later on. zeros and ones will be much more intuitive to new users that don’t fully understand Julia’s type declaration. For this reason I suspect many novice Julia users use zeros and ones when they should really be reserving an array of undefs. To rectify this I propose adding an undefs function in Base unless there is a good reason not to.
e.g.
zeros(Float32,10,10)
ones(Int8,10,10)
# vs
Array{Int16}(undef,10,10)
Maybe so. But after having quick-read the PR, I still don’t see many (good) arguments against it. It seems that the PR has been closed (by the author himself) mainly because he realized that a calloc-based zeros could make an undefs function obsolete. Even if one agrees with this, we still aren’t using calloc for zeros (see this PR). I know that @mkitti (the author of the PR) has in the mean time created ArrayAllocators.jl which provides a calloc interface. But I don’t see that the question asking for an undefs in Base has been resolved in this PR.
The linked discourse thread has a few more arguments but also doesn’t really seem to have come to a conclusion (but rather just ended after a few exchanges).
That way we round out the collection of convenience constructors in way that can always be expressed in terms of the fuller Container{Eltype}(initializer, dims...) form.
I believe that there’s an argument that novice users should probably not use undef arrays, that it is an advanced concept used for achieving micro optimization, and easy to misuse. Not sure if that’s right or not, but in many cases you can use the very convenient similar function.
I’m not sure I agree with those arguments. undef is a more advanced concept but for people new to Julia coming from other languages they will likely have a clear understanding of per-allocating and an undefs function would be no more dangerous than similar but would have the same syntax as ones and zeros that are heavily used for per-allocating. It only takes using undefs or similar once to realize the dangers. Also, depending on ones workflow (i.e. large arrays) I would argue that using undefs in place of zeros or ones can result in significant speedup. Having new users come and immediately discover speedups through using of undefs and become accustom to working with undef arrays would be a good thing.
Although I’d argue if those speedups make a noticeable difference to the runtime of your program you should probably reconsider your algorithm and think about reusing buffers (which is probably why @DNF calls it a micro optimization - if you’re calling zeros often enough for that millisecond difference to matter there’s likely a bigger issue with the algorithm)
I would also advocate for undefs. I am teaching scientific computing in Julia at university level and I prefer to familiarize student with some basic optimization concepts. I find it strange that language which considers itself to be designed for it makes it more obtuse obscure to initialize array and do nothing then initialize array and fill it. It is not a big deal, but it’s awkward.
Why is that? If I want to iterate over an array and store the results in the same sized array, what is the best practice?
do_stuff(val) = string(val)
input_vec = [1,2,3]
# Broadcast obviously works for such a small thing
output_vec = do_stuff.(input_array)
# But for more complicated processes, the best paradigm I know is
output_array = Vector{String}(undef, length(output_array))
for i in eachindex(input_array)
#more logic
output_array[i] = do_stuff(input_array[i])
end
If there’s going to be a new name, I think I would prefer allocate, or something like that.
Perhaps things would be more neat and tidy if there were just allocate and fill. Then there wouldn’t be the confusion between zeros and zero, ones and one.
My personal preference would be to use undefs as the ones and zeros convention has already been decided on. The further we get from undefs the less useful it will be over already existing syntax for defining undef arrays
One problem that I haven’t seen mentioned here with the suggested undefs is that it does not generalize beyond Array. zeros and ones have the same problem and before Julia 1.0 there were even plans to removezeros and ones (see e.g. JuliaLang/julia#24444, JuliaLang/julia#25507) rather than adding more similar functions.
Julia and packages defined hundreds of different array types and there is no consistent way to initialize them, not even for getting a zero-array. For example, given FooArray, how do you instantiate one filled with zeros? A good guess is probably something like FooArray(zeros(n)), but i) that doesn’t always work, and ii) it also feels wrong to have to construct an Array first. It is true that in many cases there will be an Array backing the FooArray, but that should then IMO be allocated and managed internally and hidden from the user. StaticArrays.jl have “solved” this with macros (@SVector zeros(3) etc) which doesn’t actually call the Array-specific zeros function, but just like zeros this does not generalize.
The current Array{T}(undef, n) implementation came from the idea of having a general array “constructor interface” of the form ArrayType{T}(initializer, size, ...), but only undef, nothing and missing exist as initializers now. There were attempts to also have Array{T}(zeros, size) and Array{T}(one(s), size) as replacements or generalizations of zeros and ones, see JuliaLang/julia#24389.
Much discussion around this can be found in JuliaLang/julia#24595 and linked issues/PRs.