Why does Array{Int}() product Array{Int, 0} and not Array{Int, 1}

Tamas_Papp · June 3, 2018, 10:10am

I would expect a Union{Nothing, Array{T, N}} to be optimized out in v0.7, especially if you branch anyway. In cases where it isn’t, a function barrier should take care of it, making the code also cleaner (although that is subjective; I happen to like small functions that do one thing only).

Iagoba_Apellaniz · June 3, 2018, 11:09am

Is is not surprising since any element of an N-dimensional array A can be accessed by A[a_1, a_2, ... a_N, 1, 1, ..., 1], note the arbitrary number of ones at the end.

For the specific case of a zero dimensional array, then, it has at least the A[1, 1, ... 1] element, which can be seen as the scalar version of the multidimensional arrays.

For short, a 2-dims array is a matrix, a 1-dim array is a vector and a 0-dims array is a scalar.

GunnarFarneback · June 3, 2018, 11:58am

The union is optimized out in 0.7 but I’m not sure I understand the objection. Even if there are other options, it is still a valid use case.

Tamas_Papp · June 3, 2018, 12:10pm

My objection is to introducing special constructors for this. Here the suggested use case is a sentinel value that fits in the type, once type stability is not a strict constraint other sentinel values are fine too, and maybe more conventional.

dlfivefifty · June 4, 2018, 3:02pm

Union{Missing, T} is not 100% as fast as just using T, causes more allocations, and is not compatible with C code:

julia> struct Foo
       x::Int
       end

julia> struct FooM
       x::Union{Missing,Int}
       end

julia> function fillfoo(T,n)
       ret = Array{T}(undef, n)
       for k = 1:n
       ret[k] = T(1)
       ret
       end
       end

julia> @time fillfoo(Foo, 100000);
  0.046550 seconds (199.50 k allocations: 3.808 MiB)

julia> @time fillfoo(FooM, 100000);
  0.049624 seconds (299.50 k allocations: 8.385 MiB)

Tamas_Papp · June 5, 2018, 7:24am

What you are measuring is sample noise. Both produce identical code in v0.7.

dlfivefifty · June 5, 2018, 8:16am

Look at the allocations. They are not identical.

Tamas_Papp · June 5, 2018, 8:28am

Naturally, since

julia> sizeof(Foo)
8

julia> sizeof(FooM)
16

as the extra information (the concrete type in the Union) needs to be represented.

dlfivefifty · June 5, 2018, 8:31am

So in high performance settings an empty array is twice as good as using Missing .

Tamas_Papp · June 5, 2018, 9:03am

I don’t quite understand what you mean by “twice as good” — while memory allocation does affect performance, it does not translate linearly.

Also, I don’t understand how this came up in this topic. In this example, one would not allocate a container at all, and in situations when one is needed, the memory cost would be a small fraction for the array (if it has elements).

Finally, in v0.7,

julia> struct A{N,T}
           inner::Array{N, T}
       end

julia> struct AM{N,T}
           inner::Union{Array{N, T}, Missing}
       end

julia> a = randn(10, 10, 10);

julia> Base.summarysize(A(a))
8056

julia> Base.summarysize(AM(a))
8056

so the compiler/memory management is now getting super-clever.

dlfivefifty · June 5, 2018, 9:12am

I don’t quite understand what you mean by “twice as good” — while memory allocation does affect performance, it does not translate linearly.

Performance isn’t everything: memory usage matters.

the memory cost would be a small fraction for the array

Depends on how many arrays there are.

Also, you ignored the other issues I raised (compatibility with C code). In any case, I think style-wise it’s a bad idea to use Union{Missing,T} when it’s only missing temporarily.

Tamas_Papp · June 5, 2018, 9:19am

Probably this is where we disagree. I think that using nothing or missing instead of a sentinel value makes the code more readable (although I fully recognize this is subjective). The only exception is when the sentinel values are extremely well-established, such as NaN.

dlfivefifty · June 5, 2018, 9:28am

The issue is the type is user-facing: it’s misleading to the user to say it may be of type Missing if it’s only temporarily missing while setup.

Or another user case that comes up a lot is when your array is dynamically resized (I have a CachedArray type floating around that does this with lazy arrays). The natural initial size is 0 x 0: this is not to indicate it’s missing, just to indicate it’s currently empty.

Also, while Union{T,Missing} might be efficient in many settings, this is a compiler optimization trick that as a user we do not and can not appreciate the full impliciations of.

Topic		Replies	Views
Why doesn't multi-dimensional Array have zero-argument constructor? General Usage arrays	10	518	September 5, 2023
Why Have 0-Dimensional Arrays? General Usage question	6	8681	December 8, 2016
Ndim (>=3) Array constructors without argument does not work General Usage question	4	680	February 27, 2017
How to declare an 1-dim Array of 310 1-dimensional Arrays with Int32 elements New to Julia	3	419	November 21, 2020
Empty Array with nonzero size along some dimension, should they be disallowed? Internals & Design arrays	2	171	April 24, 2024

Why does Array{Int}() product Array{Int, 0} and not Array{Int, 1}

Related topics