Not if the T
@liso meant was the array type, not the element type, i.e.:
empty(OffsetArray{String}, dims...)
FWIW, empty
is completely different from what weâre discussing here: it creates a collection with zero elements. So another name would be needed (uninitialized
, uninit
, etc.).
Anyway uninitialized(Array{Int}, dims...)
is almost identical to Array{Int}(uninitialized, dims...)
, so the main question is rather whether we can find something shorter than uninitialized
.
When requesting uninitialized data, then initializing the types for the union defies the point. Also, uninitialized arrays only exist for inlinealloc / bitstype data.
That way, you can request a nice 50TB array of uninitialized memory; the kernel will lazily reserve physical memory for you. If you do this with explicit initialization, then you will kill your system.
Something that might make sense is an option for zero-byte initialization, though, if this was internally implemented via calloc
for large arrays, instead of malloc
+ memset
). Regardless, this is probably not a good style, since the gc gets very very unhappy if you give it control over giant chunks of virtual memory exceeding your physical memory (so you need to do the calloc
yourself and then unsafe_wrap
without handling control to the gc, in order to get a giant vector and a gc that is not panicking about memory usage; not worth the effort, but a fun way to avoid the ccall
on push!
).
What zero-byte means depends on the type, but zero-byte init and uninitialized are both qualitatively different from fill
(if calloc
is used).
edit: I did not actually try this with 50TB, but I did play around with this branch-free push!
, where the resizing is punted to kernel/hardware. Not sure whether this works on non-linux systems.
To me, Array
is special because it seems to be the most common of all AbstractArray
s. I agree that one needs a general way of creating all kinds of uninitialized AbstractArray
s. This will probably be something like Array{T}(uninit, dims...)
. However, whatâs wrong about in addition having special convenience function uninit(T, dims...)
and ones(T,dims...)
and so on that always create Array
s?
I donât see much of a point in e.g. uninit(Array{Int}, 2, 2)
i.e. taking T not as eltype of Array
but full type, as this is no shorter than Array{Int}(uninit, 2, 2)
and in my eyes wouldnât be a convenience function at all.
What about A = Array{T}(new, dimsâŚ)
? To me, new
means request an uninitialized memory for array of dimensions dimsâŚ
, this is somehow analogous to uninitialized
? but more easier to type and much shorter?
I really dislike the initialized syntax, but I donât completely understand the rationale aside from making the intent to create an initialized object explicit. From my perspective, adding an extra argument for the sake of being explicit appears to be solving a problem that doesnât really exist. It should be clear that x = Array{Float64}(n)
does not contain sensible values because no values were given or implied by the statement. Adding the extra argument makes the syntax more verbose, annoying to type, and less readable, like Java.
I have been tripped up by the reference behavior of Julia a few times (see). This behavior could be made more explicit with x[i] = ReferenceTo(myObject)
. Unlike the case of the constructor, being explicit for references seems more justifiable. However, I would rather learn the default behavior of the language and have more readable code.
Did anyone read the explanation that was given by @mbauman and linked to multiple times?
Despite what people keep insisting here, the motivation was not to punish people who want to allocate uninitialized arrays by forcing them to type a long word. The motivation was this:
-
Collections are generally constructed in Julia by passing an iterable argument which is used to populate the collection as described here.
-
Arrays constructors predate this convention and are now an awkward exception to this general pattern. By this convention,
Array((2,3))
should construct the vector[2,3]
instead of an uninitialized 2Ă3 matrix. Similarly,Array(3)
should construct[3]
instead of an uninitialized 3-element vector. -
Array(dims...)
is a dangerously simple syntax for a fundamentally dangerous operation. Allocating uninitialized memory can introduce non-determinism and bugs into programs if not used correctly.Array(2,3)
looks like a very innocuous operation. So not only is this syntax now at odds with how collections generally work, but itâs a dangerous operation.
The code snippet you posted looks horribly unoptimized to me: it allocates zillions of tiny (length 4!) temporary arrays.
This is not how to write fast Julia code, so you canât draw meaningful performance conclusions from it.
(You should either use StaticArrays or lose the Matlab habit of using vectors for every inner loop.)
I figured after that it could probably be implemented with a call to map and a bitvector / bitarray on the side for the uninitialized data state, possibly like map((x,y)->x?defaultval:y, initvec, datavec)
I thought it might be slightly faster in the case of resetting an array, especially if it happens often⌠simple testing seems to disagree though
setting 100_000_000 bitarray with
x[:]=
- 0.002sec
setting 100_000_000 Int64s withy[:]=
- 0.15sec
indexing Int64 with bitarrayy[x]
- 1.17 sec
default val map (as above) - 1.92 sec
I think uninitialized
would be an easier sell to the general user if there was an example or two that could demonstrate new cool stuff that can be done with the syntax. For me, I still have a hard time understanding how to take advantage of it. The first thing I tried when I saw it was
a = Float32[x for x in uninitialized]
a = Float32[uninitialized for x in 1:5]
thinking that it was some type of iterable that would just initialize what ever the inferred type of the container was. The next thing I tried was
fill(Array{Complex{Float32}}(uninitialized, 2,2), 5, 5)
hoping it wouldnât fill a 5x5 matrix of pointers to the same 2x2 array.
So, even though Iâve read the explanations of why, to me it basically just behaves like special function syntax for telling the computer to allocate an array of a certain size/shape
, rather than a general way to think about container constructors that fits in with the whole language design. I also think this is why the consistency argument doesnât resonate with me yet ⌠i.e. I canât seem to use uninitialized
as I would other iterators.
Hope I donât sound like Iâm complaining here. Just thought this perspective could help tune how the new syntax is sold to others. Indeed, if I could overload it somehow to help me initialize custom containers with uninitialized objects of specific memory layout, I would love it.
Iâd really like a naive empty(T, dimsâŚ) from a users perspective. But I guess I can also live with uninit(T, dimsâŚ) given the fact that there is an empty function already with a different meaning.
Right now, the relevant definition of empty
is empty(v::AbstractVector, [eltype])
, which can only create empty vectors. How about
- Adding a length argument to the current definition, to allow creation of undefined arrays
- Creating a variant for N>1 dimensional arrays (with
dims
) - Creating a variant with just
eltype
anddims
.
This might look like
empty(v::AbstractVector, [eltype], [length])
empty(v::AbstractArray, dims::NTuple{N,T})
empty(v::AbstractArray, eltype, dims::NTuple{N,T})
# alternative or additional convenience def:
# empty(v::AbstractArray; eltype::DataType, dims::NTuple{N,T})
empty(eltype, dims::NTuple{N,T})
This would be familiar to people coming from numpy
, and would sidestep uninitialized
entirely.
Thoughts? If thereâs interest, I can try to get a pull request together.
Cheers,
Kevin
empty
sounds really weird for this operation. Thereâs nothing empty here: neither the returned collection, nor its elements.
I really donât like the word empty
for this. These arrays arenât empty. Theyâre just filled with junk. A zero-length vector isempty
. As is a 0x0 matrix.
A Vector{Int}(uninitialized, 20)
is not empty.
We canât give examples that work now because the old behavior still works but is deprecated on 0.7-DEV (master) â any new behavior cannot be implemented until 1.0 or later. But it could look something like this:
julia> Vector(k^2 for k = 1:5)
5-element Array{Int64,1}:
1
4
9
16
25
Of course, this is equivalent to [k^2 for k = 1:5]
but the same thing works for other vector types, e.g. when using StaticArrays one could write SVector(k^2 for k = 1:5)
instead of what you currently need to write which is @SVector [k^2 for k = 1:5]
.
Note that the values passed in donât necessarily need to be all of an abstract arrayâs values. Instead, for some types weâd like to provide an iterable that gives a certain subset of values. For example, constructing a diagonal matrix like this (this also doesnât work yet):
julia> Diagonal(k^2 for k = 1:5)
5Ă5 LinearAlgebra.Diagonal{Int64,Array{Int64,1}}:
1 â
â
â
â
â
4 â
â
â
â
â
9 â
â
â
â
â
16 â
â
â
â
â
25
But in order for all of this to work and be coherent and consistent in the future, we need to pave the way by âtaking awayâ the syntaxes where dimensions and tuples of dimensions are passed to array types in order to get uninitialized instances with those dimension sizes. I have to say that I had not anticipated that this would be such a sticking point for people. (Although I did suspect that using the word uninitialized
was not going to be so popular).
Iâm not totally sure what you mean, but it is possible to define methods with ::Uninitialized
to make other constructors that work similarly.
I know some people dislike the fact that this fills an array with references to the same object (which we arenât going to change), but I donât see what that has to do with uninitialized
. The same applies to any array or object passed to fill
.
empty
sounds really weird for this operation. Thereâs nothing empty here: neither the returned collection, nor its elements.
I really donât like the word
empty
for this. These arrays arenât empty. Theyâre just filled with junk. A zero-length vectorisempty
. As is a 0x0 matrix.A Vector{Int}(uninitialized, 20) is not empty.
Fair enough. I was looking for a short solution, and this does have precedence.
But I would be happy enough with undef(T, dims)
(and would define this myself, if it isnât provided). (And no, it doesnât work for alternate array types, unless T<:AbstractArray
, which mostly defeats the purpose.)
Cheers,
Kevin
Perhaps because the possibilities above did not get enough exposure (I am seeing this for the first time, although I admit that a lot of things about the deprecation path become more clear in retrospect), and also because suggestions that creating uninitialized arrays is something that one should not do, âunless you know what you are doingâ.
For what itâs worth, I actually like the verbose uninitialized
constructor. It makes such statements very visible in code, and lends them the weight they deserve. I also like the potential for future extensions of the syntax.
Initially I thought the same. But now, looking at code in boot.jl and array.jl, uninitialized
is occupying sooo much space and weight. It likely would annoy me in future. In e.g.
similar(a::Array{T,1}) where {T} = Vector{T}(uninitialized, size(a,1))
similar(a::Array{T,2}) where {T} = Matrix{T}(uninitialized, size(a,1), size(a,2))
similar(a::Array{T,1}, S::Type) where {T} = Vector{S}(uninitialized, size(a,1))
the focus goes to uninitialized and the type and the dimension fade. For me, the balance is wrong.
In earlier times I didnât really distinguish between (initialized) fill
, zeros
and (uninitialized) Array{Int32}(2,3)
. I thought, fill
takes much more time, Array
would be the standard and was not fully aware of the âuninitialized-dangersâ. Now I think a more subdued hint, even new
, my personal favorite, would be enough to point out the difference:
uninitialized arrays | initialized arrays
---------------------|---------------------
Array{..}(new, ..) | Array{..}(nothing|missing, ...
| fill, zeros, ones
Admittedly with new
one would have to read the manual/help once, but afterwards I believe, the rule, new means uninitialized, is in the mind. In addition, Array{..}(new, ..)
is âstrangeâ enough that one would look up the meaning of new
.
I agree. The weight is too great, more than they deserve.
I also suspect that creating uninitialized arrays is a lot more common than what was presumed. I create uninitialized arrays by default, excepting only when I actively want there to be a default value, such as zero.