Query.jl: collect's return type is not as expected

This is an unfortunate outcome of a reliance on type inference in the EnumerableMap type. It predetermines the output eltype by calling:

T = Base._return_type(f, Tuple{TS,})

where f in this case is essentially your myobs function and Tuple{TS,} is the expected type of the previous operations in the chain (specifically, Tuple{Grouping{Int64,NamedTuple{(:id, :y, :x1, :x2, :x3, :x4, :x5),Tuple{Int64,Float64,Float64,Float64,Float64,Float64,Float64}}}}).

My guess is that the call to myobs(_, feformula, reformula) is just sufficiently complex that the compiler can’t guarantee the return type will be MyObs{Float64}, so the best it can do is MyObs. This might be affected by a # of things, like the transpose code inferrability, StatsModels.modelcols or StatsModels.modelmatrix, FormulaTerm, or just the plain nesting complexity of everything here.

In any case, by calling Base._return_type, it commits the output eltype to whatever the compiler can figure out pre-execution, so when the result is materialized (via collect), it asks whether Base.IteratorEltype is known (in this case yes) and uses that to materialize the output array.

If instead, EnumerableMap defined:

Base.IteratorEltype(::Type{<:EnumerableMap}) = Base.EltypeUnknown()

then a different collect algorithm is used where the output array type is “promoted” as elements are iterated. Which introduces one step of type instability (i.e. at least the initial call to iterate + array allocation), but can lead to more accurate output type. I tested this locally and it indeed returns 5-element Array{MyObs{Float64},1}:.

There are trade-offs between both approaches and even the Base.collect algorithm tries to use a hybrid approach between inspecting Base._return_type and just “growing” the output container. It’s actually one of the more interesting “dynamic” problems that Julia has vs. other languages, IMO, and it’s really interesting to see different approaches and the resulting side effects.

Hope that helps?

1 Like