What about an `undefs` function in `Base`?

uniment · January 15, 2023, 1:37am

It seems we’re in agreement: we don’t want perfect newbies to be allocating uninitialized arrays, so it makes sense to have a mild hurdle, but Array{Float64}(undef, dims...) is too big a hurdle and is the wrong litmus test—knowledge of Julia’s parametric type system is largely orthogonal to whether someone will fill in their arrays properly.

From here, it seems mostly a debate of allocate or undefs.

The trade-offs I see for allocate vs undefs:

Pros for allocate:

Name is consistent as a counterpart to fill
Name makes perfectly clear that we are not filling the array with anything, but merely allocating it. This is conceptually pure.

Pros for undefs:

Argument signature undefs([T,] dims...) is more consistent with zeros, ones, and rand than with fill
If it’s ever decided that zero and one should become array initializers, e.g. Array{Float64}(zero, dims...) then it will have perfect consistency with the use of undef.

Overall, I’m leaning toward allocate. It’s more verbose, and it’s a departure from the style of zeros and ones, but that’s a good thing—it’s a healthy reminder that what we’re doing shouldn’t be taken lightly. It also feels like part of a natural progression for departing from the mathematical expression which called for zeros and ones, toward a programmatically optimized expression using allocate and fill!.

Sidenote: I know Stefan has expressed some dissatisfaction with zeros and ones as not being generic enough, but I disagree with that sentiment. I see it as an artifact of Julia’s strong heritage in numerical computing, which is a heritage to be proud of.

Certainly better than JavaScript’s heritage which causes this type of artifact:

javascript> [typeof([]), typeof({}), typeof([] + {}), typeof([] * {})]
['object', 'object', 'string', 'number']

John_Gibson · January 15, 2023, 1:39am

Better to call it undefs to be parallel to ones and zeros. The name fill already a bit off-kilter, as it both allocates and fills. A function named allocate would further muddy the waters.

mkitti · January 15, 2023, 4:01am

If we were to have an allocate method, it really should take an optional, if not required, array type as well. We really want to avoid the imitation situation that NumPy got itself into by not being extensible.

allocate(Array, Int, 4, 5)

The mechanism should be extendable to other array types.

allocate(OffsetArray, Float64, 2:5)
allocate(CuArray, UInt8, (128, 256))

In fact, we have precedence for a such a function in unsafe_wrap.

Why not just call the method Array?

Array(Int32, 16, 32)

Now it’s time for the history lesson. Back in the day in Julia 0.3 you could construct Array via the following syntax:

Array(type, dims)

https://docs.julialang.org/en/v0.3/stdlib/arrays/#Base.Array

The next step were the curly braces in Julia 0.5:

Array{Int}(dims)

https://docs.julialang.org/en/v0.5/stdlib/arrays/#Base.Array

undef and the other array initializers came last.

Array{Int32}(undef, 4, 20)

https://docs.julialang.org/en/v0.7/base/arrays/#Core.Array-Tuple{UndefInitializer,Any}

The other array initializers are nothing and missing, by the way.

postscript:
Also note that we have similar which comes quite close. You can create a undef array without any curly braces.

julia> similar([], Int, 5, 3)
5×3 Matrix{Int64}:
 139988021321736  139988021321736  139988021321736
 139988021321736  139988021321736  139988021321736
 139988021321736  139988021321736  139988021321736
 139988021321736  139988021321736  139988021321736
 139988021321736  139988021321736  139988021321736

julia> similar([], Vector, 2, 3)
2×3 Matrix{Vector}:
 #undef  #undef  #undef
 #undef  #undef  #undef

DNF · January 15, 2023, 8:05am

That was my main point, I think it deliberately should avoid that parallel.

Also, undefs is just a really weird name.

uniment · January 15, 2023, 8:19am

Are we missing anything?

julia> allocate(::Type{A}, ::Type{T}, dims...) where {A,T} = A{T}(undef, dims...)
       allocate(::Type{T}, dims...) where T = allocate(Array, T, dims...)
       allocate(dims...) = allocate(Float64, dims...)
allocate (generic function with 3 methods)

julia> allocate(2, 2)
2×2 Matrix{Float64}:
 5.0e-324  7.13625e-312
 5.0e-324  7.13625e-312

julia> allocate(Bool, 2, 2)
2×2 Matrix{Bool}:
 0  0
 1  1

julia> using CUDA

julia> allocate(CuArray, Float64, 2, 2)
2×2 CuArray{Float64, 2, CUDA.Mem.DeviceBuffer}:
 0.0  0.0
 0.0  0.0

mkitti · January 15, 2023, 9:10am

You should be able to accept dims as a single tuple.

julia> A = Array{Int8}(undef, (2,3,4));

julia> B = Array{Int16}(undef, size(A));

See this issue for other thoughts:

github.com/JuliaLang/julia

design of array constructors

opened 06:57PM - 13 Nov 17 UTC

Sacha0

arrays design

## Intro This post is an attempt to consolidate/review/analyze the several on…going, disjoint conversations on array construction. The common heart of these discussions is that the existing patchwork of array construction methods leaves something to be desired. A more coherent model that is pleasant to use, adequately general and flexible, largely consistent both within and across array types, and consists of orthogonal, composable parts would be fantastic. Some particular design objectives / razors might include: - Given an unfamiliar array type, you should have a reasonable sense of how to construct an instance with desired contents without manual/method/code sleuthing. - Reading an incantation that constructs an array of unfamiliar type, you should be able to largely deduce the array's type and contents without manual/method/code sleuthing. - The general tools for array construction should be discoverable, and writing *common operations via those general tools should be sufficiently pleasant and concise that: (1) pressure to write ad hoc convenience methods does not escalate to the point where such methods proliferate; and (2) antipatterns for array construction do not emerge to avoid (or in ignorance of) the general tools. (*Common operations include constructing: (1) an array uniformly initialized from a value; (2) an array filled from an iterable, or from a similar object defining the array's contents such as `I`; (3) one array from another; and (4) an uninitialized array.) Via a tour of the relevant issues and with the above in mind, let's explore the design space. ## Tour of the issues Let's start with... - #16029, "Merge collect() into Vector()". The crux: Some array types have constructor methods that, given (solely) a tuple or series of integer arguments specifying shape, produce an uninitialized instance of the given shape. Due to method signature collisions, those constructor methods prevent such array types from supporting construction from certain iterables (particularly tuples and integers). Illustration with `Vector`: `Vector(x)` should be able to construct a `Vector` from an arbitrary `HasLength` iterable `x` (as with e.g. `Vector(1:4)`, which intuitively yields `[1, 2, 3, 4]`). But this cannot work for tuples now, as e.g. `Vector{Float64}((2,))` instead constructs an uninitialized `Vector{Float64}` of length two. The prevailing idea for fixing the preceding issue is to: (1) deprecate uninitialized-array constructors that accept (solely) a tuple or series of integer arguments as shape, removing the method signature collision; and (2) replace those uninitialized-array constructors with something else. Two broad replacement proposals exist: 1. Introduce a new generic function, say (*modulo spelling) `blah(T, shape...)` for type `T` and tuple or series of integers `shape`, that returns an uninitialized `Array` with element type `T` and shape `shape`. This approach is an extension of the existing collection of `Array` convenience constructors inherited from other languages including `ones`, `zeros`, `eye`, `rand`, and `randn`. (* **Please note that `blah` is merely a short placeholder for whatever name comes out of the relevant ongoing bikeshed. The eventual name is not important here :).**) 2. Introduce `Array{T}(blah, shape...)` constructors where `blah` signals that the caller does not care what the return's contents are. These constructors would be specific instances of a more general model that extends and unifies the existing constructor model. That more general model is discussed further below. ### The first proposal The first proposal leads us to... - #11557, "Functions that return arrays with eltype as input should use container type instead?". The crux: `ones`, `zeros`, `eye`, `rand`, `randn`, and the proposed `blah(T, shape...)` all produce `Array`s. How do we generalize these functions to array types broadly? Two approaches exist: The de facto approach is introduction of ad hoc perturbations on these function names for each new array type: Devise an obscure prefix associated with your array type, and introduce `*ones`, `*zeros`, `*eye`, `*rand`, `*randn`, and hypothetically `*blah` functions with `*` your prefix. This approach fails all three razors above: Failing the first razor, to construct an instance of an array type that follows this approach, you have to discover that the array type takes this approach, figure out the associated prefix, and then hope the methods you find do what you expect. Failing the second razor, when you encounter the unfamiliar `bones` function in code, you might guess that function either carries out spooky divination rituals, or constructs a `b` full of `one`s (whatever `b` refers to). Along similar lines, does `spones` populate all entries in a sparse matrix with ones, or only some set of stored/nonzero entries (and if so which)? Failing the third razor, the very nature of this approach is proliferation of ad hoc convenience functions and is itself an antipattern. On the other hand, this approach's upside is that it sometimes involves a bit less typing (though often also not, see below). Nonetheless, this approach is fraught. So what's the other approach? #11557 started off by discussing that other approach: `ones`, `zeros`, `eye`, `rand`, and `randn` typically accept a result element type as either first or second argument, for example `ones(Int, (3, 3))` and `rand(MersenneTwister(), Int, (3, 3))`. That argument could instead be an array type, for example `ones(MyArray{Int}, (3, 3))` and `rand(MersenneTwister(), MyArray{Int}, (3, 3))`. This approach is enormously better than the last: It could mostly pass the first and second razors above. But it nonetheless fails the third razor, and exhibits other shortcomings (mostly inherited from the existing convenience constructors). Let's look at some of those shortcomings: - Default element type ambiguity: When element type isn't specified, for example as in `eye(MyArray, (3, 3))` or `ones(MyArray, (3, 3))`, what should the returned array's element type be? Should that default element type be consistent across array types, or allowed to vary? At present these functions yield `Float64` by default, which is a reasonable (useful) choice when running on modern CPUs. But other defaults may be more appropriate for array types associated with other hardware or applications, for example `Float16` or `Float32` for array types / contexts associated with GPUs. And one could also argue that `Int` is a more canonical type independent of context, or that `Bool` usually provides better promotion behavior, and so on. (This shortcoming to some degree violates the second razor.) - `ones` (and, to lesser degree, `eye`) element type ambiguity: As #24444 highlights, whether `ones(MyArray{T}, shape...)` returns element type `T`'s multiplicative identity (`one(T)`) or additive generator (`oneunit(T)`) is ambiguous. Of course one or the other can be chosen and documented. But choosing `one`, `ones(MyArray{T}, shape...)` can no longer consistently return a `MyArray{T}`, as for some types `typeof(one(T))` does not coincide with `T` (e.g. `one(1meter) == 1 != 1meter`). And as demonstrated in #24444, with either choice some subset of users's expectations will be violated and use cases unsatisfied, creating pressure for ad hoc solutions or additional value-names. `eye(MyArray{T}, shape...)`'s element type should less ambiguously be `one(T)`, which mitigates the latter issue but runs into the former. (This shortcoming to some degree violates both the first and second razors.) - `zeros` element type ambiguity: Prior to #16116, whether `one(T)` returned a multiplicative identity or additive generator for `T` was ambiguous. #20268 resolved this ambiguity by introducing `oneunit(T)` as the additive generator for `T` and affirming `one(T)` as a multiplicative identity. `zero` suffers from a similar issue, though likely less important in practice: Is `zero(T)` the additive identity or a sort of multiplicative zero for `T`? To illustrate, is `3meters * zero(1meters)` `0meters^2` or `0meters`? Consequently, `zeros` suffers from an ambiguity analogous to that described above for `ones`. - Handling values without an associated function: To construct a `MyArray` of `1`s, you call `ones(MyArray{Int}, (3, 3))`. To construct a `MyArray` of `0`s, you call `zeros(MyArray{Int}, (3, 3))`. To construct a `MyArray` containing the identity matrix, you call `eye(MyArray{Int}, (3, 3))`. Great so far. But how do you construct a `MyArray` of `2`s, or `-1`s, or containing `I/2`? If you are used to these convenience constructors, perhaps you respectively call `2*ones(MyArray{Int}, (3, 3))`, `-ones(MyArray{Int}, (3, 3))`, and `eye(MyArray{Int}, (3, 3))/2`. Or in the first two cases perhaps you call `fill!(blah(MyArray{Int}, (3, 3)), [2|-1])` for mutable and `fill!`-supporting `MyArray`, limiting your code's scope. If you want to avoid generating a temporary, you probably use the `fill!` incantation. But these incantations are less pleasant than `ones` or `zeros`, so perhaps you give your common values names: `twos(MyArray{Int}, (3, 3))`. And to avoid the temporary in the `eye` call, perhaps you roll a `halfeye(MyArray{Int}, (3, 3))` function to avoid allocating the temporary. Overall, antipatterns emerge and ad hoc functions proliferate. And as demonstrated in https://github.com/JuliaLang/julia/issues/24444#issuecomment-343261511 and discussed elsewhere, this issue bears out in practice and is widespread. (This shortcoming violates the third razor.) - Two disjoint, incongruous, and overlapping models are necessary: To construct an array from another array, or from an iterable or similar content specifier, you have to switch from these functions to constructors. So users must be familiar with two disjoint, incongruous, and non-orthogonal models. - Minor type argument position inconsistency: The position of these functions' type argument varies, requiring method sleuthing to figure out the correct signature. Examples: `ones(MyArray{Int}, (3, 3))` versus `rand(RNG, MyArray{Int}, (3, 3))`. Each of these shortcomings is perhaps acceptable considered in isolation. But considering these shortcomings simultaneously, this approach becomes a shaky foundation on which to build a significant component of the language. In part motivated by these and other considerations, #11557 and concurrent discussion turned to... ### The second proposal ... which is to introduce (modulo spelling of `blah`, please see above) `Array{T}(blah, shape...)` constructors, where `blah` indicates the caller does not care what the return's contents are. These constructors immediately generalize to arbitrary array types as in `MyArray{T}(blah, shape_etc...)`, and would be a specific instance of a more general model that extends the existing constructor model: The existing constructor model allows you to write, for example, `Vector(x)` for `x` any of `1:4`, `Base.OneTo(4)`, or `[1, 2, 3, 4]` (to construct the `Vector{Int}` `[1, 2, 3, 4]`), or similarly `SparseVector(x)` (to build the equivalent `SparseVector`). To the limited degree this presently works broadly, the model is `MyArray[{...}](contentspec)` where `contentspec`, for example some other array, iterable, or similar object, defines the resulting array's contents. The more general extension of this model is `MyArray[{...}](contentspec[, modifierspec...])`. Roughly, `contentspec` defines the result's contents, while `modifierspec...` (if given) provides qualifications, e.g. shape. What does this look like in practice? For the most part you would use constructors as you do now, with few exceptions. Let's go through the common construction operations mentioned above: 1. (Constructing uninitialized arrays.) To build an uninitialized `MyArray{T}`, where now you write e.g. `MyArray{T}(shape...)`, instead you would write `MyArray{T}(blah, shape...)`. (#24400 explored this possibity for `Array`s, and inevitably became a bikeshed of the spelling of `blah` :).) 2. (Constructing one array from another.) Constructing one array from another, as in e.g. `Vector(x)` or `SparseVector(x)` for `x` being `[1, 2, 3, 4]`, would work just as before. 3. (Constructing an array filled from an iterable, or from a similar object defining the array's contents such as `I`.) What is possible now, for example `Vector(x)` for `x` either `1:4` or `Base.One(4)`, would work as before. But where e.g. `Array[{T,N}](tuple)` now fails or produces an uninitialized array depending on `T`, `N`, and `tuple`, such signatures could work as for any other iterable. And additional possibilities become natural: Constructing `Array`s from `HasShape` generators is one nice example. Another, already on master (#24372), is `Matrix[{T}](I, m, n)` (alternatively `Matrix[{T}](I, (m, n))`), which constructs a `Matrix[{T}]` of shape `(m, n)` containing the identity, and is equivalent to `eye([T, ]m[, n])` with fewer ambiguities. Great so far. Now what about perhaps the most common operation, i.e. constructing an array uniformly initialized from a value? Under the general model above, this operation should of course roughly be `MyArray[{T}](it, shape...)` where `it` is an iterable repeating the desired value. But this incantation should: (a) be fairly short and pleasant to type, lest ad hoc constructors for particular array types and values proliferate to avoid using the general model; and ideally (b) mesh naturally with convenience constructors for `Array`s. Triage came up with two broad spelling possibilities. The first spelling possibility led to... - #24389, "constructors for Array from zeros/ones". The crux: Make `ones` and `zeros` iterable, allowing e.g. `MyArray([ones|zeros], shape...)`. At first blush this spelling seems reasonable: It's fairly short/pleasant, satisfying (a). And it ties to the `ones`/`zeros` convenience constructors, somewhat satisfying (b) (caveat being the slightly unnatural reversed identifier ordering as in e.g. `ones(T, shape...)` vs `MyArr{T}(ones, shape...)`). But further consideration reveals that this spelling foists most shortcomings of the first design proposal (that is, the e.g. `ones(Int, ...)` -> `ones(MyArray{Int}, ...)` proposal described above) onto this second design proposal. Specifically, the "Default element type ambiguity", "`ones`/`eye`/`zeros` element type ambiguity", and "handling values without an associated function" shortcomings described above all apply here as well. Sad razors. The second spelling possibility is `MyArray(Rep(v), shape...)` modulo spelling of `Rep(v)`, where `Rep(v)` is some convenient alias for `Iterators.Repeated(v)` with `v` any desired value. (Another possible spelling of `Rep(v)` discussed in triage is `Fill(v)`, which dovetails beautifully with the `fill` convenience constructor for the same purpose specific to `Array`s. Independent of the iterator's name, this spelling is a clean generalization of `fill` from `Array`s to arrays generally.) In practice this would look like `MyArray(Rep(1), shape...)` (instead of `MyArray{Int}(ones, shape...)`). This spelling possesses some distinct advantages: - By nature of requiring a value, this spelling suffers from neither the "default element type ambiguity" nor the "`ones`/`eye`/`zeros` elementy type ambiguity" described above. - By nature of accommodating any value, this spelling avoids the "handling values without an associated function" issue and the consequent antipatterns and ad hoc method proliferation. - By nature of requiring and accepting a value, this spelling is frequently more compact and efficient than equivalents with the other spelling: Consider `MyArray(Rep(1.0im), shape...)` versus `im*MyArray{Complex{Float64}}(ones, shape...)`, or `MyArray(Rep(1f0/ℯ), shape...)` versus `MyArray{Float32}(ones, shape...)/ℯ`. - This spelling is a composition of well-defined, fundamental tools that, once learned, can be deployed to good effect elsewhere. In contrast, the other spelling is ad hoc and a bit of a pun. Great. With this latter spelling, overall this second proposal appears to satisfy both the broad design objectives and three razors at the top, and avoids the shortcomings of the first proposal. ### What else? Convenience constructors Convenience constructor are an important part of this discussion and about which there is much to consider. But that topic I will leave for another post. Thanks for reading! :)

jar1 · January 15, 2023, 9:14am

In this API there’s no room to say where the memory comes from, we just pull it out of the sky. If I have a block of memory, I might want to split it up and make arrays and dictionaries out of it, e.g. using allocate!(block, Array, Float64, (2,2)).

uniment · January 15, 2023, 9:15am

As I’ve written it here, allocate just forwards all responsibility to Array, which handles the rest.

julia> A = allocate(Int8, (2, 3, 4))
2×3×4 Array{Int8, 3}:
[:, :, 1] =
 0  0  0
 0  0  0

[:, :, 2] =
 0   16  -105
 0  -63   -84

[:, :, 3] =
  -6  0  16
 127  0  81

[:, :, 4] =
 -105   -6  0
  -84  127  0

julia> B = allocate(eltype(A), size(A))
       (typeof(A), size(A)) == (typeof(B), size(B))
true

mkitti · January 15, 2023, 9:38am

That’s one direction I thought of going with ArrayAllocators.jl. One could create a BumpAllocator. I still need to think about how to handle garbage collection with that though.

This is a good point though that the above is conflating memory allocation with array construction. ArrayAllocators.allocate is primarily meant for memory allocation. This is invoked behind the scenes after the number of bytes needed is computed.

julia> using ArrayAllocators

julia> ArrayAllocators.allocate(calloc, 1024)
Ptr{Nothing} @0x0000000004270990

julia> ArrayAllocators.allocate(malloc, 1024)
Ptr{Nothing} @0x0000000004271a20

julia> ArrayAllocators.allocate(MemAlign(1024), 1024)
Ptr{Nothing} @0x0000000004277400

mkitti · January 15, 2023, 10:15am

uniment:

julia> allocate(::Type{A}, ::Type{T}, dims...) where {A,T} = A{T}(undef, dims...)
       allocate(::Type{T}, dims...) where T = allocate(Array, T, dims...)
       allocate(dims...) = allocate(Float64, dims...)

One thing I’m not a fan of here is that we are peeling off optional arguments from the beginning of the argument list whereas I usually want to think about optional arguments occurring at the end or as keywords. There is also an ambiguity of what allocate(Array, 5, 3) might do. Does it make an Array{Array} or Array{Float64}? Maybe those would be better as keyword arguments.

allocate(dims...; array_type = Array, element_type = Float64, init = undef) =
    array_type{element_type}(init, dims...)

Sukera · January 15, 2023, 12:27pm

“Better” is a very subjective term here, as has already been explained in the very thread you linked. There is more to the story there than “just” dropping in calloc as a replacement to malloc and be done with it.

This benchmark is misleading because it does not zero the memory until sum is called. Please don’t sweep the big caveat of using calloc under the rug and present it as a pure win in performance - it just isn’t. I’ve already benchmarked this in the other thread extensively.

No - this only “works” for types where a zero bitpattern also happens to coincide with the zero of that type, as I mentioned in the linked thread:

In which case, using calloc by default means the memory is initialized twice, once by the kernel when it gives you the memory that’s filled with 0, and once when julia has to inevitably call zero to initialize the data properly. This will lead to users being confused about why zeros is slower than it needs to be and should thus be avoided.

All of these are internal implementation details you cannot rely on. Do not assume these to be true.

The interface for allocation I would prefer to all options presented here looks like this:

"""
Allocate a single object of type `T`, using memory managed by `Allocator`.
Return a `T`, throws an OutOfMemory exception when it fails to allocate memory.
"""
allocate(::Allocator, ::T)

"""
Allocate `n` objects of type `T`, using memory managed by `Allocator`.
Return a collection of `T`, throws an OutOfMemory exception when it fails to allocate memory.
"""
allocate(::Allocator, ::T, n)

This is how Zig does it, and for good reason - you don’t need more. Rust has some more fanciness for deallocation. Both of these can be used in the compiler to hoist allocations to the stack if need be. The second one semantically returns an Array (or a fixed-size equivalent) when passed a default GC.

That’s because “splitting the memory up” to create other objects out of is not a safe operation - if you allocate a large array and use that as the “backing memory” for multiple objects, you’re well on the way of reimplementing a memory manager yourself. For the regular julia GC, it would have to keep that big chunk around as long as even a single other object (that may use only a tiny portion of that memory) is still accessible. You really need the allocator passed in the API to handle that part of the operation for you.

mkitti · January 15, 2023, 3:37pm

Does Base allocation handle this any differently?

julia> struct Foo
           x::Int
           Foo() = new(5)
       end

julia> Foo()
Foo(5)

julia> Array{Foo}(undef, 5)
5-element Vector{Foo}:
 Foo(0)
 Foo(0)
 Foo(0)
 Foo(0)
 Foo(0)

Sukera · January 15, 2023, 4:30pm

I really don’t want to rehash the same conversation again, but undef just happens to give you whatever is in memory. That’s precisely what undef means - there are no constructors involved, ever.

julia> struct Foo
           bar::Int
       end
julia> Base.zero(::Type{Foo}) = Foo(1)

julia> zeros(Foo, 5)
10-element Vector{Foo}:
 Foo(1)
 Foo(1)
 Foo(1)
 Foo(1)
 Foo(1)

julia> while true
           x = Array{Foo}(undef, 5)
           if any(!=(Foo(0)), x)
               display(x)
               break
           end
       end
5-element Vector{Foo}:
 Foo(140600521258544)
 Foo(140600521258608)
 Foo(140600521258672)
 Foo(140600521257344)
 Foo(0)

In contrast, zeros does call zero:

julia> @edit zeros(Foo, (5,))

Opens editor here:

for (fname, felt) in ((:zeros, :zero), (:ones, :one))
    @eval begin
        $fname(dims::DimOrInd...) = $fname(dims)
        $fname(::Type{T}, dims::DimOrInd...) where {T} = $fname(T, dims)
        $fname(dims::Tuple{Vararg{DimOrInd}}) = $fname(Float64, dims)
        $fname(::Type{T}, dims::NTuple{N, Union{Integer, OneTo}}) where {T,N} = $fname(T, map(to_dim, dims))
        function $fname(::Type{T}, dims::NTuple{N, Integer}) where {T,N}
            a = Array{T,N}(undef, dims)
            fill!(a, $felt(T))
            return a
        end
        function $fname(::Type{T}, dims::Tuple{}) where {T}
            a = Array{T}(undef)
            fill!(a, $felt(T))
            return a
        end
    end
end

which defines

function zeros(::Type{T}, dims::NTuple{N, Integer}) where {T,N}
    a = Array{T}(undef, dims)
    fill!(a, zero(T))
    return a
end

Eben60 · January 15, 2023, 9:40pm

That similar can be used to create arrays having non-similar dimensions is not exactly intuitive. Still, the form

similar(Float64[], 2,3)

may be easier to remember than

Array{Float64}(undef, 2, 3)

So, thank you

aplavin · January 16, 2023, 12:19pm

similar doesn’t refer to similar dimensions specifically. Its meaning is basically “as similar to the passed array as possible”, optionally specifying the target eltype or dimensions.

sijo · January 16, 2023, 7:17pm

We could add a fill method that takes an additional parameter to specify a pre-allocated array that should be filled. The allocate and fill names would make sense then, and it would not be surprising that fill can also allocate itself as a convenience, if no array is given (i.e. the current behavior).

mcabbott · January 16, 2023, 8:26pm

Note that in addition to methods accepting an instance, similar also has some accepting a type. These are closer to the undefs idea, but they aren’t very flexible:

x = [1 2]  # Array, could be SArray, or CuArray, etc.
similar(x)               # instance
similar(x, 1, 3)         # instance -- change size
similar(x, Float32, 3)   # instance -- change eltype & ndims
similar(typeof(x), 1, 3) # type -- must specify size (unless static)

# These all give an error:
similar(typeof(x), Float32, 1, 3)  # type -- cannot change eltype
similar(typeof(x), 3)  # type -- cannot change ndims
similar(Array, Int, 3)  # type -- cannot use partial type

Defining the last three here would, as a side-effect, give another spelling for Array{Int}(undef, 3). But it would also be useful elsewhere, for handling stranger arrays, where you cannot assume that the first type parameter is the element type. This would allow e.g. stack(Vector{Int}[]) to work (and be type-stable) without seeing an instance.

This is fill!(A, val).

DNF · January 16, 2023, 8:27pm

This is what fill! does currently. The trailing ! indicates that it mutates one of its input arguments.

uniment · January 19, 2023, 8:44am

Is there a syntax for array comprehensions of different array types, such as CuArray? Maybe such a thing would diminish the need for pre-allocation and for loops to begin with. Something like this, perhaps?

Type{CuArray{Int}}[i*j for i=1:2, j=1:2]

It’s not consistent with the ElType[...] syntax we’re accustomed to, but StaticArrays already broke that expectation with SA[...]. It’s nice that Type{...} doesn’t provide any constructors, so it wouldn’t break anything.

mkitti · January 20, 2023, 3:13am

I created a package ArrayInitializers.jl that provides a set of initializer types that can be passed as the first argument to an array constructor. Since the initializer often contains information about the element type, one can avoid the use of curly braces.

julia> using Pkg

julia> pkg"add https://github.com/mkitti/ArrayInitializers.jl"
...

julia> using ArrayInitializers

julia> Array(zeroinit(Int), 3, 9)
3×9 Matrix{Int64}:
 0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0
 0  0  0  0  0  0  0  0  0

julia> Array(zeroinit(Float64), 3, 9)
3×9 Matrix{Float64}:
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0
 0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0  0.0

julia> Array(oneinit(Float64), 3, 9)
3×9 Matrix{Float64}:
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0
 1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0  1.0

julia> const fives = init(5)
ArrayInitializers.FillArrayInitializer{Int64}(5)

julia> Array(fives, 3)
3-element Vector{Int64}:
 5
 5
 5

julia> Array(undeftype(Int), 3)
3-element Vector{Int64}:
       1
       3
 7105903

This also works with other kind of arrays such as OffsetArrays or BitArrays:

julia> using OffsetArrays

julia> OffsetArray(fives, 2:5)
4-element OffsetArray(::Vector{Int64}, 2:5) with eltype Int64 with indices 2:5:
 5
 5
 5
 5

julia> BitArray(oneinit, 5)
5-element BitVector:
 1
 1
 1
 1
 1

Topic		Replies	Views
Handy function for making empty arrays Internals & Design	5	734	February 29, 2020
Undefs.jl: Convenience and Experiment Package Announcements package , array , undef , undefinitializer	10	613	January 25, 2023
Meaning and alternatives to "undef" when initializing vectors New to Julia	11	2378	June 4, 2020
Initialize an array (or other structure), add General Usage	4	214	October 1, 2022
Initializing Array of Arrays with undef gives UndefRefError? General Usage arrays	15	2013	August 23, 2022

What about an `undefs` function in `Base`?

Related topics