1.0 annoyances and Matlab comparison

Not if the T @liso meant was the array type, not the element type, i.e.:
empty(OffsetArray{String}, dims...)

FWIW, empty is completely different from what we’re discussing here: it creates a collection with zero elements. So another name would be needed (uninitialized, uninit, etc.).

Anyway uninitialized(Array{Int}, dims...) is almost identical to Array{Int}(uninitialized, dims...), so the main question is rather whether we can find something shorter than uninitialized.

1 Like

When requesting uninitialized data, then initializing the types for the union defies the point. Also, uninitialized arrays only exist for inlinealloc / bitstype data.

That way, you can request a nice 50TB array of uninitialized memory; the kernel will lazily reserve physical memory for you. If you do this with explicit initialization, then you will kill your system.

Something that might make sense is an option for zero-byte initialization, though, if this was internally implemented via calloc for large arrays, instead of malloc + memset). Regardless, this is probably not a good style, since the gc gets very very unhappy if you give it control over giant chunks of virtual memory exceeding your physical memory (so you need to do the calloc yourself and then unsafe_wrap without handling control to the gc, in order to get a giant vector and a gc that is not panicking about memory usage; not worth the effort, but a fun way to avoid the ccall on push!).

What zero-byte means depends on the type, but zero-byte init and uninitialized are both qualitatively different from fill (if calloc is used).

edit: I did not actually try this with 50TB, but I did play around with this branch-free push!, where the resizing is punted to kernel/hardware. Not sure whether this works on non-linux systems.

To me, Array is special because it seems to be the most common of all AbstractArrays. I agree that one needs a general way of creating all kinds of uninitialized AbstractArrays. This will probably be something like Array{T}(uninit, dims...). However, what’s wrong about in addition having special convenience function uninit(T, dims...) and ones(T,dims...) and so on that always create Arrays?

I don’t see much of a point in e.g. uninit(Array{Int}, 2, 2) i.e. taking T not as eltype of Array but full type, as this is no shorter than Array{Int}(uninit, 2, 2) and in my eyes wouldn’t be a convenience function at all.

3 Likes

What about A = Array{T}(new, dims…)? To me, new means request an uninitialized memory for array of dimensions dims…, this is somehow analogous to uninitialized? but more easier to type and much shorter?

2 Likes

I really dislike the initialized syntax, but I don’t completely understand the rationale aside from making the intent to create an initialized object explicit. From my perspective, adding an extra argument for the sake of being explicit appears to be solving a problem that doesn’t really exist. It should be clear that x = Array{Float64}(n) does not contain sensible values because no values were given or implied by the statement. Adding the extra argument makes the syntax more verbose, annoying to type, and less readable, like Java.

I have been tripped up by the reference behavior of Julia a few times (see). This behavior could be made more explicit with x[i] = ReferenceTo(myObject). Unlike the case of the constructor, being explicit for references seems more justifiable. However, I would rather learn the default behavior of the language and have more readable code.

3 Likes

Did anyone read the explanation that was given by @mbauman and linked to multiple times?

Despite what people keep insisting here, the motivation was not to punish people who want to allocate uninitialized arrays by forcing them to type a long word. The motivation was this:

  • Collections are generally constructed in Julia by passing an iterable argument which is used to populate the collection as described here.

  • Arrays constructors predate this convention and are now an awkward exception to this general pattern. By this convention, Array((2,3)) should construct the vector [2,3] instead of an uninitialized 2×3 matrix. Similarly, Array(3) should construct [3] instead of an uninitialized 3-element vector.

  • Array(dims...) is a dangerously simple syntax for a fundamentally dangerous operation. Allocating uninitialized memory can introduce non-determinism and bugs into programs if not used correctly. Array(2,3) looks like a very innocuous operation. So not only is this syntax now at odds with how collections generally work, but it’s a dangerous operation.

20 Likes

The code snippet you posted looks horribly unoptimized to me: it allocates zillions of tiny (length 4!) temporary arrays.

This is not how to write fast Julia code, so you can’t draw meaningful performance conclusions from it.

(You should either use StaticArrays or lose the Matlab habit of using vectors for every inner loop.)

3 Likes

I figured after that it could probably be implemented with a call to map and a bitvector / bitarray on the side for the uninitialized data state, possibly like map((x,y)->x?defaultval:y, initvec, datavec)
I thought it might be slightly faster in the case of resetting an array, especially if it happens often… simple testing seems to disagree though

setting 100_000_000 bitarray with x[:]= - 0.002sec
setting 100_000_000 Int64s with y[:]= - 0.15sec
indexing Int64 with bitarray y[x] - 1.17 sec
default val map (as above) - 1.92 sec

I think uninitialized would be an easier sell to the general user if there was an example or two that could demonstrate new cool stuff that can be done with the syntax. For me, I still have a hard time understanding how to take advantage of it. The first thing I tried when I saw it was

a = Float32[x for x in uninitialized]  
a = Float32[uninitialized for x in 1:5]

thinking that it was some type of iterable that would just initialize what ever the inferred type of the container was. The next thing I tried was

fill(Array{Complex{Float32}}(uninitialized, 2,2), 5, 5)

hoping it wouldn’t fill a 5x5 matrix of pointers to the same 2x2 array.

So, even though I’ve read the explanations of why, to me it basically just behaves like special function syntax for telling the computer to allocate an array of a certain size/shape, rather than a general way to think about container constructors that fits in with the whole language design. I also think this is why the consistency argument doesn’t resonate with me yet … i.e. I can’t seem to use uninitialized as I would other iterators.

Hope I don’t sound like I’m complaining here. Just thought this perspective could help tune how the new syntax is sold to others. Indeed, if I could overload it somehow to help me initialize custom containers with uninitialized objects of specific memory layout, I would love it.

2 Likes

I’d really like a naive empty(T, dims…) from a users perspective. But I guess I can also live with uninit(T, dims…) given the fact that there is an empty function already with a different meaning.

Right now, the relevant definition of empty is empty(v::AbstractVector, [eltype]), which can only create empty vectors. How about

  1. Adding a length argument to the current definition, to allow creation of undefined arrays
  2. Creating a variant for N>1 dimensional arrays (with dims)
  3. Creating a variant with just eltype and dims.

This might look like

empty(v::AbstractVector, [eltype], [length])
empty(v::AbstractArray, dims::NTuple{N,T})
empty(v::AbstractArray, eltype, dims::NTuple{N,T})
# alternative or additional convenience def:
# empty(v::AbstractArray; eltype::DataType, dims::NTuple{N,T})
empty(eltype, dims::NTuple{N,T})

This would be familiar to people coming from numpy, and would sidestep uninitialized entirely.

Thoughts? If there’s interest, I can try to get a pull request together.

Cheers,
Kevin

1 Like

empty sounds really weird for this operation. There’s nothing empty here: neither the returned collection, nor its elements.

3 Likes

I really don’t like the word empty for this. These arrays aren’t empty. They’re just filled with junk. A zero-length vector isempty. As is a 0x0 matrix.

A Vector{Int}(uninitialized, 20) is not empty.

5 Likes

We can’t give examples that work now because the old behavior still works but is deprecated on 0.7-DEV (master) – any new behavior cannot be implemented until 1.0 or later. But it could look something like this:

julia> Vector(k^2 for k = 1:5)
5-element Array{Int64,1}:
  1
  4
  9
 16
 25

Of course, this is equivalent to [k^2 for k = 1:5] but the same thing works for other vector types, e.g. when using StaticArrays one could write SVector(k^2 for k = 1:5) instead of what you currently need to write which is @SVector [k^2 for k = 1:5].

Note that the values passed in don’t necessarily need to be all of an abstract array’s values. Instead, for some types we’d like to provide an iterable that gives a certain subset of values. For example, constructing a diagonal matrix like this (this also doesn’t work yet):

julia> Diagonal(k^2 for k = 1:5)
5×5 LinearAlgebra.Diagonal{Int64,Array{Int64,1}}:
 1  ⋅  ⋅   ⋅   ⋅
 ⋅  4  ⋅   ⋅   ⋅
 ⋅  ⋅  9   ⋅   ⋅
 ⋅  ⋅  ⋅  16   ⋅
 ⋅  ⋅  ⋅   ⋅  25

But in order for all of this to work and be coherent and consistent in the future, we need to pave the way by “taking away” the syntaxes where dimensions and tuples of dimensions are passed to array types in order to get uninitialized instances with those dimension sizes. I have to say that I had not anticipated that this would be such a sticking point for people. (Although I did suspect that using the word uninitialized was not going to be so popular).

13 Likes

I’m not totally sure what you mean, but it is possible to define methods with ::Uninitialized to make other constructors that work similarly.

I know some people dislike the fact that this fills an array with references to the same object (which we aren’t going to change), but I don’t see what that has to do with uninitialized. The same applies to any array or object passed to fill.

empty sounds really weird for this operation. There’s nothing empty here: neither the returned collection, nor its elements.

I really don’t like the word empty for this. These arrays aren’t empty. They’re just filled with junk. A zero-length vector isempty. As is a 0x0 matrix.

A Vector{Int}(uninitialized, 20) is not empty.

Fair enough. I was looking for a short solution, and this does have precedence.

But I would be happy enough with undef(T, dims) (and would define this myself, if it isn’t provided). (And no, it doesn’t work for alternate array types, unless T<:AbstractArray, which mostly defeats the purpose.)

Cheers,
Kevin

Perhaps because the possibilities above did not get enough exposure (I am seeing this for the first time, although I admit that a lot of things about the deprecation path become more clear in retrospect), and also because suggestions that creating uninitialized arrays is something that one should not do, “unless you know what you are doing”.

For what it’s worth, I actually like the verbose uninitialized constructor. It makes such statements very visible in code, and lends them the weight they deserve. I also like the potential for future extensions of the syntax.

10 Likes

Initially I thought the same. But now, looking at code in boot.jl and array.jl, uninitialized is occupying sooo much space and weight. It likely would annoy me in future. In e.g.

similar(a::Array{T,1}) where {T}                    = Vector{T}(uninitialized, size(a,1))
similar(a::Array{T,2}) where {T}                    = Matrix{T}(uninitialized, size(a,1), size(a,2))
similar(a::Array{T,1}, S::Type) where {T} = Vector{S}(uninitialized, size(a,1))

the focus goes to uninitialized and the type and the dimension fade. For me, the balance is wrong.

In earlier times I didn’t really distinguish between (initialized) fill, zeros and (uninitialized) Array{Int32}(2,3). I thought, fill takes much more time, Array would be the standard and was not fully aware of the ‘uninitialized-dangers’. Now I think a more subdued hint, even new, my personal favorite, would be enough to point out the difference:

uninitialized arrays | initialized arrays
---------------------|---------------------
Array{..}(new, ..)   | Array{..}(nothing|missing, ...
                     | fill, zeros, ones

Admittedly with new one would have to read the manual/help once, but afterwards I believe, the rule, new means uninitialized, is in the mind. In addition, Array{..}(new, ..) is ‘strange’ enough that one would look up the meaning of new.

3 Likes

I agree. The weight is too great, more than they deserve.

I also suspect that creating uninitialized arrays is a lot more common than what was presumed. I create uninitialized arrays by default, excepting only when I actively want there to be a default value, such as zero.

1 Like