@fredrikekre I’m all in for general, consistent constructors as well. But why not have both? Seems to (have been?) Stefan’s position as well (see the comment I linked above).
More specifically,
we made the decision for 1.0 to keep zeros and ones and they won’t go away any time soon (in 2.0 at the earliest and even then I doubt it). In this context, undef(s) is missing.
while I like generality, Array is arguably special (because it is our standard / default array type). I would say it’s pragmatic to have convenient syntax (zeros, ones) for allocating regular arrays. (Convenient + general would be best of course, but I wouldn’t want to only trade convenience for generality.)
If you have an instancefooarr::FooArray, then creating a new, zero-filled one is just zero(fooarr). Creating it from the type and a size specification is not as simple, I guess (though maybe fill!(similar(fooarr), false)?) Yikes, that makes no sense.
But for StaticArrays you don’t need the macro, but can write zero(SVector{3, Float64}).
Exactly, you have to “bootstrap” the process by creating the first instance somewhere.
I know, but this is also an example of a AbstractArray-implementation specific construction. Ideally that would have been spelled SVector{Float64}(zero, 3). My point is that right now you kind of have to look up how you create a zero-array for every new array type you use.
I actually think fill ‘fills’ that rôle (i.e. at least as convenient as zeros/ones and more general—but still not general enough). Not that I think zeros will disappear, I just think it’s redundant and makes people miss/forget zero, which is too bad.
Or perhaps you already have an instance available anyway. But, yes, I agree in general.
This might be an argument agaist that name. undef is fundamentally different from zero/false/one/true. The latter are values, while undef isn’t. It might be a good idea to choose a contrasting name.
It does seem rather silly, doesn’t it? that setting up an array requires knowledge of two completely disjoint approaches:
zeros(2, 2) # works
Array{Float64}(zeros, 2, 2) # doesn't work
ones(2, 2) # works
Array{Float64}(ones, 2, 2) # doesn't work
rand(2, 2) # works
Array{Float64}(rand, 2, 2) # doesn't work
undef(2, 2) # doesn't work
Array{Float64}(undef, 2, 2) # works
the moment you begin to think about optimizing, and you want to initialize an array without defining its values, you have to learn about:
The element type (usually Float64 anyway)
The array type Array
Some obscure UndefInitializer() object
Type parameterization curly braces (!?!)
It’s a rather jarring step function in what you need to know.
‘Jarring’ is exactly the right way to talk about it, I think.
When I was learning Julia, and I was already familiar with zeros() and ones(), my first attempt at creating uninitialised arrays with Array{T}(undef, shape...) had me thinking: ‘gosh, I’m in the weeds here, surely I’m doing this the wrong way or unidiomatically’.
Coming from Python/Numpy, I was intially reaching for something like numpy.empty(), which unfortunately already has a different meaning in Julia.
undefs(), as a convenience function, just makes sense to me (including its name, which seems wholly consistent with zeros(), ones(), falses(), trues(), etc.).
The reason for my contrariness about the name undef, is that I don’t think it is like zeros and ones at all.
If there were an allocate/fill pair of functions, that would be much nicer, imo. zeros and ones cannot be removed, if course, but it could be natural to use them less and less.
It seems a bit messy and inconsistent to use ones for making an array of 1s, but fill to make one of 2s. (or even worse, 2 .* ones() which I often see!)
What we have now are several functions that allocate arrays, and fill them with values. And there are different functions for different values. And yet, undefs would be an odd man out because it doesn’t fill, it just allocates, though its name gives the impression it will fill arrays with a ‘special’ undef value. This whole process decomposes nicely into the concepts ‘allocate’ and ‘fill’. And we already have fill/fill!.
I’m not sure creating a new package really solves the accessibility issue that is being discussed here. A new function undefs is not targeted at experienced practitioners that will seek out a package to define undefs. The goal here is simply to further reduce barriers to entry and to new concepts, even if only a small amount. This is exactly why functions like zeros, ones, trues, falses exists. Even though we all know how to allocate arrays using incredibly powerful and generic Constructors we all still started by using the convenience functions zeros, ones, trues, falses. And now that we know how to used Constructors… we still use functions zeros, ones, trues, falses for convenience. People learn concepts in increments and Constructors can be an awkward, if not challenging concept early on. A undefs functions just helps the language meet the user where they are early on and provides convenience thereafter. Julia is a powerful language that provides multiple ways/functions to achieve the same task. I really don’t see a downside, but maybe I’m missing something.
Edit: Somehow I missed that a package was actually created… I suspect you did this as motivation and not a solution… well done.
But fill!/fill+allocate (or some similar name) are as convenient, more powerful and general, and simpler to remember, than a growing number of special-case functions.
I mean, why not a new function, two, to help avoid the unfortunate 2 .* ones() pattern?
You are advocating a change to Base. If you want to make the case for it, the best way is start with a package and show that it is indispensable.
Once upon a time, I made a similar argument. We can keep discussing this, but discussing it alone will not change anything. Now I’ve created a package to move the question along. I’ve also begun registering it.
As others have mentioned, some think that zeros, ones, trues, and falses should not exist in Base. It places an emphasis on Array where perhaps it really should not be special to the user.
You have made the argument that undefs should be as accessible to a novice user as zeros, ones, trues, and falses. However, undefs is dangerous. It looks like it might act like zeros, and more than one user on this forum has mistaken undef initialization as filling the array with zeros. You’ve also made the performance argument.
As Carsten Bauer mentioned, there is a better solution than undefs for the novice user where they can obtain nearly the same performance. This is calloc, a standard C function. On operating systems, there also more specialized methods in the same vein. You can find my initial explorations with this here:
I have since packaged this into ArrayAllocators.jl. Let me offer a brief demonstration.
Rather than making undefs more accessible, perhaps we should focus on making zeros “faster” in more cases and defer the eager usage to the fill! syntax.
There’s is a minimalist philosophy attached to Base. As the application programming interface (API) of Base expands, it becomes harder to modify or change that API due to guarantees of backwards compatibility. For many, Base and standard library is already too large. A minimalist Base however is easy to customize with packages.
One reason to add new functionality to Base is that the functionality cannot added via a package. In particular, it may not be possible add the functionality without committing type piracy. Type piracy occurs when we attempt to extend methods for types where we “own” neither the method or the type. In the case of Undefs.jl we do not commit type piracy because we have created a new method, undefs. You might think that ArrayAllocators.zeros looks like piracy, but in fact it does not extend Base.zeros.
In summary, undefs is quite distinct from zeros and other methods because it can lead to novices easily writing incorrect code. Rather we should consider alternatives such as calloc and perhaps changing the implementation of zeros. These alternatives and other implementations may provide similar performance while also being correct and safe. Given the controversy the path forward is to create a package to implement the proposed functionality.
Thank you for the detailed explanation. I found the conversation here very helpful and I agree that a different implementation of zeros could offer a better solution than a new undefs function.
I’m unclear about what “that” refers to. I suspect you mean calloc.
calloc could work for arrays of mutable types, but it would depend implementation details of how arrays of mutable types work. If we assume those details, then yes we can make it work.
Below I assume that a Vector{Vector{Int}} is actually an array of pointers to Vector{Int} and that #undef here actually means those pointers are C_NULL. I happen to know the internal details, so let me demonstrate.
The underlying memory representation of Vector{Vector{Int}} is an array of pointers.
#undef actually means those pointers are C_NULL (in other words, 0)
When set, the pointers point to a jl_array structure which has 48 bytes of meta data before the array contents.
Practically what I need to know make calloc work for an array of mutable types is how much memory to allocate. At the moment, that is eight bytes per element on a 64-bit platform. We could need sizeof(Vector{Int}) == 8. Currently this errors.
To succinctly answer your question, calloc does work because #undef is actually C_NULL under the hood, which is just Ptr{Nothing}(0). This requires internal knowledge, which may not be a good idea to use.
I mean, is this a catchall solution for all arrays of any conceivable type, mutable or immutable?
Even if it is, the name zeros doesn’t really seem apt, except for numerical arrays. There could still be an interest in an allocation function for generally typed arrays.
My question is partly also in response to this remark:
I’m actually in favor of an undefs function, I just think it should be called allocate (or something).