What about an `undefs` function in `Base`?

@fredrikekre I’m all in for general, consistent constructors as well. But why not have both? Seems to (have been?) Stefan’s position as well (see the comment I linked above).

More specifically,

  1. we made the decision for 1.0 to keep zeros and ones and they won’t go away any time soon (in 2.0 at the earliest and even then I doubt it). In this context, undef(s) is missing.
  2. while I like generality, Array is arguably special (because it is our standard / default array type). I would say it’s pragmatic to have convenient syntax (zeros, ones) for allocating regular arrays. (Convenient + general would be best of course, but I wouldn’t want to only trade convenience for generality.)
1 Like

If you have an instance fooarr::FooArray, then creating a new, zero-filled one is just zero(fooarr). Creating it from the type and a size specification is not as simple, I guess (though maybe fill!(similar(fooarr), false)?) Yikes, that makes no sense.

But for StaticArrays you don’t need the macro, but can write zero(SVector{3, Float64}).

1 Like

Exactly, you have to “bootstrap” the process by creating the first instance somewhere.

I know, but this is also an example of a AbstractArray-implementation specific construction. Ideally that would have been spelled SVector{Float64}(zero, 3). My point is that right now you kind of have to look up how you create a zero-array for every new array type you use.

3 Likes

I actually think fill ‘fills’ that rôle (i.e. at least as convenient as zeros/ones and more general—but still not general enough). Not that I think zeros will disappear, I just think it’s redundant and makes people miss/forget zero, which is too bad.

Or perhaps you already have an instance available anyway. But, yes, I agree in general.

1 Like

undefs would also fit with the trues and falses family so I don’t see adding it as causing confusion, only convenience

1 Like

This might be an argument agaist that name. undef is fundamentally different from zero/false/one/true. The latter are values, while undef isn’t. It might be a good idea to choose a contrasting name.

I think changing the name away from undefs would reduce its pedagogical value

1 Like

It does seem rather silly, doesn’t it? that setting up an array requires knowledge of two completely disjoint approaches:

zeros(2, 2)                 # works
Array{Float64}(zeros, 2, 2) # doesn't work

ones(2, 2)                  # works
Array{Float64}(ones, 2, 2)  # doesn't work

rand(2, 2)                  # works
Array{Float64}(rand, 2, 2)  # doesn't work

undef(2, 2)                 # doesn't work
Array{Float64}(undef, 2, 2) # works

the moment you begin to think about optimizing, and you want to initialize an array without defining its values, you have to learn about:

  1. The element type (usually Float64 anyway)
  2. The array type Array
  3. Some obscure UndefInitializer() object
  4. Type parameterization curly braces (!?!)

It’s a rather jarring step function in what you need to know.

9 Likes

‘Jarring’ is exactly the right way to talk about it, I think.

When I was learning Julia, and I was already familiar with zeros() and ones(), my first attempt at creating uninitialised arrays with Array{T}(undef, shape...) had me thinking: ‘gosh, I’m in the weeds here, surely I’m doing this the wrong way or unidiomatically’.

Coming from Python/Numpy, I was intially reaching for something like numpy.empty(), which unfortunately already has a different meaning in Julia.

undefs(), as a convenience function, just makes sense to me (including its name, which seems wholly consistent with zeros(), ones(), falses(), trues(), etc.).

5 Likes

You can create an empty vector like this Int[].

The reason for my contrariness about the name undef, is that I don’t think it is like zeros and ones at all.

If there were an allocate/fill pair of functions, that would be much nicer, imo. zeros and ones cannot be removed, if course, but it could be natural to use them less and less.

It seems a bit messy and inconsistent to use ones for making an array of 1s, but fill to make one of 2s.:person_shrugging: (or even worse, 2 .* ones() which I often see!)

What we have now are several functions that allocate arrays, and fill them with values. And there are different functions for different values. And yet, undefs would be an odd man out because it doesn’t fill, it just allocates, though its name gives the impression it will fill arrays with a ‘special’ undef value. This whole process decomposes nicely into the concepts ‘allocate’ and ‘fill’. And we already have fill/fill!.

6 Likes

A bit off topic now, but as per the docs, numpy.empty() creates an uninitialised list of a given size.

It doesn’t create an empty array? That’s not an ideal name, then.

I’m not sure I quite follow why empty is a better name than allocate for a function that allocates non-empty arrays😉

6 Likes

Technically, there is nothing stopping anyone from creating a package that exports a method called undefs. Presenting Undefs.jl:

using Pkg
pkg"add https://github.com/mkitti/Undefs.jl"

julia> using Undefs

julia> undefs(5, 15)
5×15 Matrix{Float64}:
 6.94453e-310  6.94453e-310  6.94453e-310  6.94453e-310  6.94453e-310  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 6.94453e-310  6.94453e-310  6.94452e-310  6.94453e-310  6.94453e-310  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 6.94453e-310  6.94453e-310  6.94453e-310  6.94453e-310  0.0           0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 6.94453e-310  6.94453e-310  6.94453e-310  6.94453e-310  0.0           0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 6.94453e-310  6.94453e-310  6.94453e-310  6.94453e-310  0.0           0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

julia> undefs(Int, 4, 5)
4×5 Matrix{Int64}:
 33060         0           8   401296946   401296946
     1  57934833           0  1673715905  1673715905
  1000       255           0   401296946   401296946
  1000      4096  1673715905  1673715905           0
3 Likes

I’m not sure creating a new package really solves the accessibility issue that is being discussed here. A new function undefs is not targeted at experienced practitioners that will seek out a package to define undefs. The goal here is simply to further reduce barriers to entry and to new concepts, even if only a small amount. This is exactly why functions like zeros, ones, trues, falses exists. Even though we all know how to allocate arrays using incredibly powerful and generic Constructors we all still started by using the convenience functions zeros, ones, trues, falses. And now that we know how to used Constructors… we still use functions zeros, ones, trues, falses for convenience. People learn concepts in increments and Constructors can be an awkward, if not challenging concept early on. A undefs functions just helps the language meet the user where they are early on and provides convenience thereafter. Julia is a powerful language that provides multiple ways/functions to achieve the same task. I really don’t see a downside, but maybe I’m missing something.

Edit: Somehow I missed that a package was actually created… I suspect you did this as motivation and not a solution… well done.

julia> using Undefs

julia> undefs(10,10)
10×10 Matrix{Float64}:
 2.3048e-314  2.3048e-314   2.30519e-314  2.3048e-314  2.3048e-314   2.29972e-314  2.29972e-314  2.30519e-314  2.30519e-314  2.30471e-314
 ⋮                                                                   ⋮                                                       
 2.3048e-314  2.29956e-314  2.29956e-314  2.3048e-314  2.29972e-314  2.29972e-314  2.3048e-314   2.47011e-314  2.30519e-314  5.0e-324
4 Likes

But fill!/fill+allocate (or some similar name) are as convenient, more powerful and general, and simpler to remember, than a growing number of special-case functions.

I mean, why not a new function, two, to help avoid the unfortunate 2 .* ones() pattern?

2 Likes

You are advocating a change to Base. If you want to make the case for it, the best way is start with a package and show that it is indispensable.

Once upon a time, I made a similar argument. We can keep discussing this, but discussing it alone will not change anything. Now I’ve created a package to move the question along. I’ve also begun registering it.

As others have mentioned, some think that zeros, ones, trues, and falses should not exist in Base. It places an emphasis on Array where perhaps it really should not be special to the user.

You have made the argument that undefs should be as accessible to a novice user as zeros, ones, trues, and falses. However, undefs is dangerous. It looks like it might act like zeros, and more than one user on this forum has mistaken undef initialization as filling the array with zeros. You’ve also made the performance argument.

As Carsten Bauer mentioned, there is a better solution than undefs for the novice user where they can obtain nearly the same performance. This is calloc, a standard C function. On operating systems, there also more specialized methods in the same vein. You can find my initial explorations with this here:

I have since packaged this into ArrayAllocators.jl. Let me offer a brief demonstration.

julia> using ArrayAllocators

julia> @time zeros(Int, 1024, 1024);
  0.011977 seconds (2 allocations: 8.000 MiB)

julia> @time Array{Int}(undef, 1024, 1024);
  0.000023 seconds (2 allocations: 8.000 MiB)

julia> @time fill!(Array{Int}(undef, 1024, 1024), 0);
  0.006183 seconds (2 allocations: 8.000 MiB)

julia> @time ArrayAllocators.zeros(Int, 1024, 1024);
  0.000175 seconds (3 allocations: 8.000 MiB)

julia> sum(ArrayAllocators.zeros(Int, 1024, 1024))
0

julia> @time Array{Int}(calloc, 1024, 1024);
  0.000026 seconds (3 allocations: 8.000 MiB)

Rather than making undefs more accessible, perhaps we should focus on making zeros “faster” in more cases and defer the eager usage to the fill! syntax.

There’s is a minimalist philosophy attached to Base. As the application programming interface (API) of Base expands, it becomes harder to modify or change that API due to guarantees of backwards compatibility. For many, Base and standard library is already too large. A minimalist Base however is easy to customize with packages.

One reason to add new functionality to Base is that the functionality cannot added via a package. In particular, it may not be possible add the functionality without committing type piracy. Type piracy occurs when we attempt to extend methods for types where we “own” neither the method or the type. In the case of Undefs.jl we do not commit type piracy because we have created a new method, undefs. You might think that ArrayAllocators.zeros looks like piracy, but in fact it does not extend Base.zeros.

In summary, undefs is quite distinct from zeros and other methods because it can lead to novices easily writing incorrect code. Rather we should consider alternatives such as calloc and perhaps changing the implementation of zeros. These alternatives and other implementations may provide similar performance while also being correct and safe. Given the controversy the path forward is to create a package to implement the proposed functionality.

5 Likes

Thank you for the detailed explanation. I found the conversation here very helpful and I agree that a different implementation of zeros could offer a better solution than a new undefs function.

Would that work for all kinds of data types? zeros really only makes sense for numerical data. What about allocations of unitialized data in general?

1 Like

I’m unclear about what “that” refers to. I suspect you mean calloc.

calloc could work for arrays of mutable types, but it would depend implementation details of how arrays of mutable types work. If we assume those details, then yes we can make it work.

Below I assume that a Vector{Vector{Int}} is actually an array of pointers to Vector{Int} and that #undef here actually means those pointers are C_NULL. I happen to know the internal details, so let me demonstrate.

julia> ptr = Ptr{Vector{Int}}(Libc.calloc(16 ,8))
Ptr{Vector{Int64}} @0x0000000003fbcdb0

julia> A = unsafe_wrap(Vector{Vector{Int}}, ptr, 16)
16-element Vector{Vector{Int64}}:
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef

julia> unsafe_load(Ptr{Ptr{Nothing}}(ptr), 1)
Ptr{Nothing} @0x0000000000000000

julia> A[1] = [5]
1-element Vector{Int64}:
 5

julia> unsafe_load(Ptr{Ptr{Nothing}}(ptr), 1)
Ptr{Nothing} @0x00007f22915f7f90

julia> unsafe_load(unsafe_load(Ptr{Ptr{Int}}(ptr), 1), 7)
5

julia> A[1] .= 6
1-element Vector{Int64}:
 6

julia> unsafe_load(unsafe_load(Ptr{Ptr{Int}}(ptr), 1), 7)
6

For the grand finale, I will reset the first element to #undef.

julia> unsafe_store!(Ptr{Ptr{Nothing}}(ptr), C_NULL)
Ptr{Ptr{Nothing}} @0x00000000044aa700

julia> A
16-element Vector{Vector{Int64}}:
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef
 #undef

The knowledge I’m using above is:

  1. The underlying memory representation of Vector{Vector{Int}} is an array of pointers.
  2. #undef actually means those pointers are C_NULL (in other words, 0)
  3. When set, the pointers point to a jl_array structure which has 48 bytes of meta data before the array contents.

Practically what I need to know make calloc work for an array of mutable types is how much memory to allocate. At the moment, that is eight bytes per element on a 64-bit platform. We could need sizeof(Vector{Int}) == 8. Currently this errors.

To succinctly answer your question, calloc does work because #undef is actually C_NULL under the hood, which is just Ptr{Nothing}(0). This requires internal knowledge, which may not be a good idea to use.

I mean, is this a catchall solution for all arrays of any conceivable type, mutable or immutable?

Even if it is, the name zeros doesn’t really seem apt, except for numerical arrays. There could still be an interest in an allocation function for generally typed arrays.

My question is partly also in response to this remark:

I’m actually in favor of an undefs function, I just think it should be called allocate (or something).

1 Like