`length.(AbstractArray[])` and `size.(AbstractArray[])` return empty arrays of `Any`

Code:

julia> length.(AbstractArray[])
Any[]

Wat?.. I should be expecting an array of lengths, right? AbstractArray[] is empty, so it should be an empty array of lengths. A length is a number. The length of a finite collection is an Integer (I’d prefer Unsigned, but Integer is all right too), so length.(AbstractArray[]) should be Integer[], in my opinion.


Code:

julia> size.(AbstractArray[])
Any[]

Wat? So, in a way, size(AbstractArray) sort of returned Any here? But ?size says that it returns “a tuple containing the dimensions of A”. So I’m expecting an empty array of tuples, some kind of AbstractTuple[], but not Any[].


One of my computations involves sum(length.(AbstractArray[])), but it’s not possible to compute it since sum(Any[]) attempts to call zero(::Type{Any}) which doesn’t exist.

Am I missing something here? Is this expected behaviour?

You’re broadcasting a function over no elements of no concrete type - as such, there is no possible method of length to apply to a nonexistent element and so inference can’t possibly know what the return type would be. It only knows that it should be a collection, since you broadcasted - hence, Any[].

Same goes for your second example - broadcasting size over no elements of an abstractly typed container has no applicable methods, only allowing inference to say “whatever is returned will be a collection holding possibly anything”.

I do wonder why you have/need a vector of AbstractArray though?

4 Likes

I have a function that returns a vector of model parameters, which are Arrays. However, if a model doesn’t have any parameters, I want to return an empty array. [] is an array of Any, so I attempted to specify a more concrete type like AbstractVecOrMat[] or AbstractArray[]: https://github.com/ForceBru/Matches.jl/blob/8a5eb7cff10df2722d67b6bb18b9a0b6f27e4ae1/src/ML.jl#L33.

Why not implement length like this?

julia> my_len(arr::AbstractArray)::Integer = length(arr)
my_len (generic function with 1 method)

julia> my_len()::Integer = 0
my_len (generic function with 2 methods)

julia> my_len.(AbstractArray[])
Integer[] # A vector of integers, as expected

julia> my_len.(AbstractArray[randn(2, 4)])
1-element Vector{Int64}:
 8

julia> my_len.(AbstractArray[randn(2, 4), rand(3, 2)])
2-element Vector{Int64}:
 8
 6

julia> 

Surely length() (called with no arguments) must return zero? But currently it’s an error:

julia> length()
ERROR: MethodError: no method matching length()

Surely length() (called with no arguments) must return zero? But currently it’s an error:

Surely length() must remain undefined? Why it should be defined?

Your change to use your own function is probably wise, redefining Base.length is type piracy if you are not adding a method for a type you own.

Integer is an abstract type, you are not helping much by replacing Any by it. Lengths are often Int and will appear as Int64 or Int32 based on your machine.

1 Like

Because that only works due to you falling back on length in your AbstractArray catchall case. You can’t write that method without specifying what should happen when it’s actually dispatched to (which is what you’d need to do were it not for that fallback).

Also, length() is not the same as (effectively) mapping length over an empty collection (i.e. broadcasting) - in the former case, length is still called, while in the latter case length is never called since there is no element to pass to it to call it with. This is purely a consequence of type inference having to figure out some common type using the most applicable method (which doesn’t/can’t exist), not of it being called or not.

Indeed, it was never called lol. But a definition like len(x)::Integer = length(x) seems to correctly return integers every time:

julia> len(x)::Integer = length(x)
len (generic function with 1 method)

julia> len.(AbstractArray[])
Integer[]

julia> len.(AbstractArray[]) |> sum
0

Why not help type inference by indicating that length always returns an Integer or maybe a Number? Currently it looks like length can return Anything at all.

What I’m getting at is that you can’t write that method without having that = length(x) part there (which you’d have to do if you were to implement that for length itself).

The problem is that an AbstractArray could be an arbitrary user-defined type. You could define a subtype of AbstractArray where the length(a) function sends an email and returns a screenshot of the reply. So Julia has no way of knowing what type to use for an empty array of AbstractArray.

In contrast, if you have an empty array but the type is more concrete, then Julia can figure out that length returns an Int:

julia> length.(Vector[])
Int64[]

julia> length.(Array[])
Int64[]

Note that, for performance reasons, you should generally avoid containers of abstractly typed elements, like AbstractArray[] or even Array[] or Vector[], as opposed to e.g. Vector{Float64}[] or Vector{typeof(somevariable)}[] or Vector{eltype(somearray)}[].

9 Likes

I wish Julia used traits here:

trait Lengthy{T}
    length(x::T)::Number
end

Then all definitions of length for any type would have to return a Number, with the full interface of a Number, be it a screenshot, 1 or pi. But that’s not Julia anymore (although I’m aware of Holy traits), so…

This makes total sense, however. An element of AbstractArray could indeed be anything, and length defined for the element’s type could return anything too. Looks like I should be using a heterogeneous collection here instead of arrays - maybe a tuple.

Traits are not necessary there. If the base developers wanted to work with this assumption they could just say in the documentation that nobody should extend Base.length to return anything but an Int and work based on that. There are already traits in Base to know if an iterator has length or not (is infinite or unknown, for example): Collections and Data Structures · The Julia Language.

To be fair, Julia does treat empty collections as a weird special case in broadcasting.

For nonempty collections, the eltype of a collection returned from broadcasting is typically as small as possible. So logically the eltype of a returned empty collection should be Union{} not Any.

I guess the Julia devs decided at some point that returning Union{}[] was not useful? I’d disagree.

1 Like

If [] was equal to Union{}[] you wouldn’t be able to push to an empty collection that didn’t have a type specified.

1 Like

I was talking specifically about broadcasting but didn’t say so, will edit my post.

1 Like

Unless you want to special-case the length function in Julia’s compiler, you would still need a mechanism in Julia to prove to the compiler that length(::AbstractArray) will return an Int for all possible subtypes of AbstractArray.

3 Likes

I think that’s good. I might want to define a type of object such that Base.length(x) returns its length with specific units.

To work around sum(length.(array)) not working, if eltype(array) is not well defined, you may do: sum(Integer.(length.(array))).

This seems to work:

julia> Integer.(length.(AbstractArray[]))
Integer[]

However:

julia> length.(AbstractArray[]) .|> Integer
Any[]

Aren’t func.(arg) and arg .|> func the same thing?


Then the length with units should be a subtype of Number, so Base.length(x)::Number holds.

That’s what traits are for, though. Who is going to follow the documentation? What if someone defines length(thing::MyType) = "I'm a string"? Code will break anyway. Still, what I’m imagining with traits isn’t Julia, so it is what it is.

They are, but |> passes the Integer as an argument — and Julia doesn’t specialize on a passed type as an argument by default. We’d need to add special ::Type{T} handling to |>… which is probably worthwhile.

Yeah, it should be more consistent. Currently this looks like a bug: two equivalent operations return different things given the same input. Should I maybe open an issue about this, or is it already known?

Yes, I was thinking about a special case. For now I believe you can personally just type-assert the call location, which is more flexible but not very scalable.