I would expect a Union{Nothing, Array{T, N}}
to be optimized out in v0.7
, especially if you branch anyway. In cases where it isn’t, a function barrier should take care of it, making the code also cleaner (although that is subjective; I happen to like small functions that do one thing only).
Is is not surprising since any element of an N-dimensional array A
can be accessed by A[a_1, a_2, ... a_N, 1, 1, ..., 1]
, note the arbitrary number of ones at the end.
For the specific case of a zero dimensional array, then, it has at least the A[1, 1, ... 1]
element, which can be seen as the scalar version of the multidimensional arrays.
For short, a 2-dims array is a matrix, a 1-dim array is a vector and a 0-dims array is a scalar.
The union is optimized out in 0.7 but I’m not sure I understand the objection. Even if there are other options, it is still a valid use case.
My objection is to introducing special constructors for this. Here the suggested use case is a sentinel value that fits in the type, once type stability is not a strict constraint other sentinel values are fine too, and maybe more conventional.
Union{Missing, T}
is not 100% as fast as just using T
, causes more allocations, and is not compatible with C code:
julia> struct Foo
x::Int
end
julia> struct FooM
x::Union{Missing,Int}
end
julia> function fillfoo(T,n)
ret = Array{T}(undef, n)
for k = 1:n
ret[k] = T(1)
ret
end
end
julia> @time fillfoo(Foo, 100000);
0.046550 seconds (199.50 k allocations: 3.808 MiB)
julia> @time fillfoo(FooM, 100000);
0.049624 seconds (299.50 k allocations: 8.385 MiB)
What you are measuring is sample noise. Both produce identical code in v0.7
.
Look at the allocations. They are not identical.
Naturally, since
julia> sizeof(Foo)
8
julia> sizeof(FooM)
16
as the extra information (the concrete type in the Union
) needs to be represented.
So in high performance settings an empty array is twice as good as using Missing
.
I don’t quite understand what you mean by “twice as good” — while memory allocation does affect performance, it does not translate linearly.
Also, I don’t understand how this came up in this topic. In this example, one would not allocate a container at all, and in situations when one is needed, the memory cost would be a small fraction for the array (if it has elements).
Finally, in v0.7
,
julia> struct A{N,T}
inner::Array{N, T}
end
julia> struct AM{N,T}
inner::Union{Array{N, T}, Missing}
end
julia> a = randn(10, 10, 10);
julia> Base.summarysize(A(a))
8056
julia> Base.summarysize(AM(a))
8056
so the compiler/memory management is now getting super-clever.
I don’t quite understand what you mean by “twice as good” — while memory allocation does affect performance, it does not translate linearly.
Performance isn’t everything: memory usage matters.
the memory cost would be a small fraction for the array
Depends on how many arrays there are.
Also, you ignored the other issues I raised (compatibility with C code). In any case, I think style-wise it’s a bad idea to use Union{Missing,T}
when it’s only missing temporarily.
Probably this is where we disagree. I think that using nothing
or missing
instead of a sentinel value makes the code more readable (although I fully recognize this is subjective). The only exception is when the sentinel values are extremely well-established, such as NaN
.
The issue is the type is user-facing: it’s misleading to the user to say it may be of type Missing
if it’s only temporarily missing while setup.
Or another user case that comes up a lot is when your array is dynamically resized (I have a CachedArray
type floating around that does this with lazy arrays). The natural initial size is 0 x 0: this is not to indicate it’s missing, just to indicate it’s currently empty.
Also, while Union{T,Missing}
might be efficient in many settings, this is a compiler optimization trick that as a user we do not and can not appreciate the full impliciations of.