1.0 annoyances and Matlab comparison

Since when was programming dignified???

3 Likes

The goal was not to protect against uninitialized data via the length of the name, but just by forcing you to be explicit. I do find uninitialized very long though; I would be fine with simply renaming it to uninit.

11 Likes

Which kind of initialization we have to wait after deprecation phase?

Edit:

And BTW could you explain what exactly do you like? What will be more simple or more clear?

Why not to keep default Array constructor to use uninitialized memory?

2 Likes

:+1:

I guess it won’t be optional, so could there be other values to pass here, like a valid default value &/or the original uninit / junk / inseclogindeets (lol). A default would probably be handy for arrays of Union types, and could even be performant if done lazily? ie set or return the default value on read if still undefined / unwritten, like a hidden(?) union of type x + initialized boolean that will fall back to type x on true (initialized)

I’d also really like to see a convenience constructor for constructing an arbitraily initialized array complementing ones, zeros, etc.

I’d really like a naive empty(T, dims...) from a users perspective. But I guess I can also live with uninit(T, dims...) given the fact that there is an empty function already with a different meaning.

3 Likes

See Array{T}(initialization,...) syntax - #2 by mbauman.

1 Like

You would need to use e.g. OffsetArray{T}(uninitialized, dims...) for any AbstractArray other than Arrays. Creating a bunch of Array specific functions when there are so many other type of AbstractArrays seems less good than having functions that work for all of them.

2 Likes

Not if the T @liso meant was the array type, not the element type, i.e.:
empty(OffsetArray{String}, dims...)

FWIW, empty is completely different from what we’re discussing here: it creates a collection with zero elements. So another name would be needed (uninitialized, uninit, etc.).

Anyway uninitialized(Array{Int}, dims...) is almost identical to Array{Int}(uninitialized, dims...), so the main question is rather whether we can find something shorter than uninitialized.

1 Like

When requesting uninitialized data, then initializing the types for the union defies the point. Also, uninitialized arrays only exist for inlinealloc / bitstype data.

That way, you can request a nice 50TB array of uninitialized memory; the kernel will lazily reserve physical memory for you. If you do this with explicit initialization, then you will kill your system.

Something that might make sense is an option for zero-byte initialization, though, if this was internally implemented via calloc for large arrays, instead of malloc + memset). Regardless, this is probably not a good style, since the gc gets very very unhappy if you give it control over giant chunks of virtual memory exceeding your physical memory (so you need to do the calloc yourself and then unsafe_wrap without handling control to the gc, in order to get a giant vector and a gc that is not panicking about memory usage; not worth the effort, but a fun way to avoid the ccall on push!).

What zero-byte means depends on the type, but zero-byte init and uninitialized are both qualitatively different from fill (if calloc is used).

edit: I did not actually try this with 50TB, but I did play around with this branch-free push!, where the resizing is punted to kernel/hardware. Not sure whether this works on non-linux systems.

To me, Array is special because it seems to be the most common of all AbstractArrays. I agree that one needs a general way of creating all kinds of uninitialized AbstractArrays. This will probably be something like Array{T}(uninit, dims...). However, what’s wrong about in addition having special convenience function uninit(T, dims...) and ones(T,dims...) and so on that always create Arrays?

I don’t see much of a point in e.g. uninit(Array{Int}, 2, 2) i.e. taking T not as eltype of Array but full type, as this is no shorter than Array{Int}(uninit, 2, 2) and in my eyes wouldn’t be a convenience function at all.

3 Likes

What about A = Array{T}(new, dims…)? To me, new means request an uninitialized memory for array of dimensions dims…, this is somehow analogous to uninitialized? but more easier to type and much shorter?

2 Likes

I really dislike the initialized syntax, but I don’t completely understand the rationale aside from making the intent to create an initialized object explicit. From my perspective, adding an extra argument for the sake of being explicit appears to be solving a problem that doesn’t really exist. It should be clear that x = Array{Float64}(n) does not contain sensible values because no values were given or implied by the statement. Adding the extra argument makes the syntax more verbose, annoying to type, and less readable, like Java.

I have been tripped up by the reference behavior of Julia a few times (see). This behavior could be made more explicit with x[i] = ReferenceTo(myObject). Unlike the case of the constructor, being explicit for references seems more justifiable. However, I would rather learn the default behavior of the language and have more readable code.

3 Likes

Did anyone read the explanation that was given by @mbauman and linked to multiple times?

Despite what people keep insisting here, the motivation was not to punish people who want to allocate uninitialized arrays by forcing them to type a long word. The motivation was this:

  • Collections are generally constructed in Julia by passing an iterable argument which is used to populate the collection as described here.

  • Arrays constructors predate this convention and are now an awkward exception to this general pattern. By this convention, Array((2,3)) should construct the vector [2,3] instead of an uninitialized 2×3 matrix. Similarly, Array(3) should construct [3] instead of an uninitialized 3-element vector.

  • Array(dims...) is a dangerously simple syntax for a fundamentally dangerous operation. Allocating uninitialized memory can introduce non-determinism and bugs into programs if not used correctly. Array(2,3) looks like a very innocuous operation. So not only is this syntax now at odds with how collections generally work, but it’s a dangerous operation.

20 Likes

The code snippet you posted looks horribly unoptimized to me: it allocates zillions of tiny (length 4!) temporary arrays.

This is not how to write fast Julia code, so you can’t draw meaningful performance conclusions from it.

(You should either use StaticArrays or lose the Matlab habit of using vectors for every inner loop.)

3 Likes

I figured after that it could probably be implemented with a call to map and a bitvector / bitarray on the side for the uninitialized data state, possibly like map((x,y)->x?defaultval:y, initvec, datavec)
I thought it might be slightly faster in the case of resetting an array, especially if it happens often… simple testing seems to disagree though

setting 100_000_000 bitarray with x[:]= - 0.002sec
setting 100_000_000 Int64s with y[:]= - 0.15sec
indexing Int64 with bitarray y[x] - 1.17 sec
default val map (as above) - 1.92 sec

I think uninitialized would be an easier sell to the general user if there was an example or two that could demonstrate new cool stuff that can be done with the syntax. For me, I still have a hard time understanding how to take advantage of it. The first thing I tried when I saw it was

a = Float32[x for x in uninitialized]  
a = Float32[uninitialized for x in 1:5]

thinking that it was some type of iterable that would just initialize what ever the inferred type of the container was. The next thing I tried was

fill(Array{Complex{Float32}}(uninitialized, 2,2), 5, 5)

hoping it wouldn’t fill a 5x5 matrix of pointers to the same 2x2 array.

So, even though I’ve read the explanations of why, to me it basically just behaves like special function syntax for telling the computer to allocate an array of a certain size/shape, rather than a general way to think about container constructors that fits in with the whole language design. I also think this is why the consistency argument doesn’t resonate with me yet … i.e. I can’t seem to use uninitialized as I would other iterators.

Hope I don’t sound like I’m complaining here. Just thought this perspective could help tune how the new syntax is sold to others. Indeed, if I could overload it somehow to help me initialize custom containers with uninitialized objects of specific memory layout, I would love it.

2 Likes

I’d really like a naive empty(T, dims…) from a users perspective. But I guess I can also live with uninit(T, dims…) given the fact that there is an empty function already with a different meaning.

Right now, the relevant definition of empty is empty(v::AbstractVector, [eltype]), which can only create empty vectors. How about

  1. Adding a length argument to the current definition, to allow creation of undefined arrays
  2. Creating a variant for N>1 dimensional arrays (with dims)
  3. Creating a variant with just eltype and dims.

This might look like

empty(v::AbstractVector, [eltype], [length])
empty(v::AbstractArray, dims::NTuple{N,T})
empty(v::AbstractArray, eltype, dims::NTuple{N,T})
# alternative or additional convenience def:
# empty(v::AbstractArray; eltype::DataType, dims::NTuple{N,T})
empty(eltype, dims::NTuple{N,T})

This would be familiar to people coming from numpy, and would sidestep uninitialized entirely.

Thoughts? If there’s interest, I can try to get a pull request together.

Cheers,
Kevin

1 Like

empty sounds really weird for this operation. There’s nothing empty here: neither the returned collection, nor its elements.

3 Likes

I really don’t like the word empty for this. These arrays aren’t empty. They’re just filled with junk. A zero-length vector isempty. As is a 0x0 matrix.

A Vector{Int}(uninitialized, 20) is not empty.

5 Likes