`length.(AbstractArray[])` and `size.(AbstractArray[])` return empty arrays of `Any`

Code:

``````julia> length.(AbstractArray[])
Any[]
``````

Wat?.. I should be expecting an array of lengths, right? `AbstractArray[]` is empty, so it should be an empty array of lengths. A length is a number. The length of a finite collection is an `Integer` (I’d prefer `Unsigned`, but `Integer` is all right too), so `length.(AbstractArray[])` should be `Integer[]`, in my opinion.

Code:

``````julia> size.(AbstractArray[])
Any[]
``````

Wat? So, in a way, `size(AbstractArray)` sort of returned `Any` here? But `?size` says that it returns "a tuple containing the dimensions of `A`". So I’m expecting an empty array of tuples, some kind of `AbstractTuple[]`, but not `Any[]`.

One of my computations involves `sum(length.(AbstractArray[]))`, but it’s not possible to compute it since `sum(Any[])` attempts to call `zero(::Type{Any})` which doesn’t exist.

Am I missing something here? Is this expected behaviour?

You’re broadcasting a function over no elements of no concrete type - as such, there is no possible method of `length` to apply to a nonexistent element and so inference can’t possibly know what the return type would be. It only knows that it should be a collection, since you broadcasted - hence, `Any[]`.

Same goes for your second example - broadcasting `size` over no elements of an abstractly typed container has no applicable methods, only allowing inference to say “whatever is returned will be a collection holding possibly anything”.

I do wonder why you have/need a vector of `AbstractArray` though?

4 Likes

I have a function that returns a vector of model parameters, which are `Array`s. However, if a model doesn’t have any parameters, I want to return an empty array. `[]` is an array of `Any`, so I attempted to specify a more concrete type like `AbstractVecOrMat[]` or `AbstractArray[]`: Matches.jl/ML.jl at 8a5eb7cff10df2722d67b6bb18b9a0b6f27e4ae1 · ForceBru/Matches.jl · GitHub.

Why not implement `length` like this?

``````julia> my_len(arr::AbstractArray)::Integer = length(arr)
my_len (generic function with 1 method)

julia> my_len()::Integer = 0
my_len (generic function with 2 methods)

julia> my_len.(AbstractArray[])
Integer[] # A vector of integers, as expected

julia> my_len.(AbstractArray[randn(2, 4)])
1-element Vector{Int64}:
8

julia> my_len.(AbstractArray[randn(2, 4), rand(3, 2)])
2-element Vector{Int64}:
8
6

julia>
``````

Surely `length()` (called with no arguments) must return zero? But currently it’s an error:

``````julia> length()
ERROR: MethodError: no method matching length()
``````

Surely length() (called with no arguments) must return zero? But currently it’s an error:

Surely `length()` must remain undefined? Why it should be defined?

Your change to use your own function is probably wise, redefining `Base.length` is type piracy if you are not adding a method for a type you own.

`Integer` is an abstract type, you are not helping much by replacing `Any` by it. Lengths are often `Int` and will appear as `Int64` or `Int32` based on your machine.

1 Like

Because that only works due to you falling back on `length` in your `AbstractArray` catchall case. You can’t write that method without specifying what should happen when it’s actually dispatched to (which is what you’d need to do were it not for that fallback).

Also, `length()` is not the same as (effectively) mapping `length` over an empty collection (i.e. broadcasting) - in the former case, `length` is still called, while in the latter case `length` is never called since there is no element to pass to it to call it with. This is purely a consequence of type inference having to figure out some common type using the most applicable method (which doesn’t/can’t exist), not of it being called or not.

Indeed, it was never called lol. But a definition like `len(x)::Integer = length(x)` seems to correctly return integers every time:

``````julia> len(x)::Integer = length(x)
len (generic function with 1 method)

julia> len.(AbstractArray[])
Integer[]

julia> len.(AbstractArray[]) |> sum
0
``````

Why not help type inference by indicating that `length` always returns an `Integer` or maybe a `Number`? Currently it looks like `length` can return `Any`thing at all.

What I’m getting at is that you can’t write that method without having that `= length(x)` part there (which you’d have to do if you were to implement that for `length` itself).

The problem is that an `AbstractArray` could be an arbitrary user-defined type. You could define a subtype of `AbstractArray` where the `length(a)` function sends an email and returns a screenshot of the reply. So Julia has no way of knowing what type to use for an empty array of `AbstractArray`.

In contrast, if you have an empty array but the type is more concrete, then Julia can figure out that `length` returns an `Int`:

``````julia> length.(Vector[])
Int64[]

julia> length.(Array[])
Int64[]
``````

Note that, for performance reasons, you should generally avoid containers of abstractly typed elements, like `AbstractArray[]` or even `Array[]` or `Vector[]`, as opposed to e.g. `Vector{Float64}[]` or `Vector{typeof(somevariable)}[]` or `Vector{eltype(somearray)}[]`.

9 Likes

I wish Julia used traits here:

``````trait Lengthy{T}
length(x::T)::Number
end
``````

Then all definitions of `length` for any type would have to return a `Number`, with the full interface of a `Number`, be it a screenshot, `1` or `pi`. But that’s not Julia anymore (although I’m aware of Holy traits), so…

This makes total sense, however. An element of `AbstractArray` could indeed be anything, and `length` defined for the element’s type could return anything too. Looks like I should be using a heterogeneous collection here instead of arrays - maybe a tuple.

Traits are not necessary there. If the base developers wanted to work with this assumption they could just say in the documentation that nobody should extend `Base.length` to return anything but an `Int` and work based on that. There are already traits in `Base` to know if an iterator has length or not (is infinite or unknown, for example): Collections and Data Structures · The Julia Language.

To be fair, Julia does treat empty collections as a weird special case in broadcasting.

For nonempty collections, the eltype of a collection returned from broadcasting is typically as small as possible. So logically the eltype of a returned empty collection should be `Union{}` not `Any`.

I guess the Julia devs decided at some point that returning `Union{}[]` was not useful? I’d disagree.

1 Like

If `[]` was equal to `Union{}[]` you wouldn’t be able to push to an empty collection that didn’t have a type specified.

1 Like

I was talking specifically about broadcasting but didn’t say so, will edit my post.

1 Like

Unless you want to special-case the `length` function in Julia’s compiler, you would still need a mechanism in Julia to prove to the compiler that `length(::AbstractArray)` will return an `Int` for all possible subtypes of `AbstractArray`.

3 Likes

I think that’s good. I might want to define a type of object such that `Base.length(x)` returns its length with specific units.

To work around `sum(length.(array))` not working, if `eltype(array)` is not well defined, you may do: `sum(Integer.(length.(array)))`.

This seems to work:

``````julia> Integer.(length.(AbstractArray[]))
Integer[]
``````

However:

``````julia> length.(AbstractArray[]) .|> Integer
Any[]
``````

Aren’t `func.(arg)` and `arg .|> func` the same thing?

Then the length with units should be a subtype of `Number`, so `Base.length(x)::Number` holds.

That’s what traits are for, though. Who is going to follow the documentation? What if someone defines `length(thing::MyType) = "I'm a string"`? Code will break anyway. Still, what I’m imagining with traits isn’t Julia, so it is what it is.

They are, but `|>` passes the `Integer` as an argument — and Julia doesn’t specialize on a passed type as an argument by default. We’d need to add special `::Type{T}` handling to `|>`… which is probably worthwhile.

Yeah, it should be more consistent. Currently this looks like a bug: two equivalent operations return different things given the same input. Should I maybe open an issue about this, or is it already known?

Yes, I was thinking about a special case. For now I believe you can personally just type-assert the call location, which is more flexible but not very scalable.