Inferring loose container type based on elements that are added to it later?

Benjia · July 23, 2020, 2:23pm

I often have functions that look like this, where I’m allocating some output vector and pushing some homogeneously typed objects onto it:

julia> function foo()
           out = []
           push!(out, 1)
           return out
       end
foo (generic function with 1 method)

julia> @code_warntype foo()
Variables
  #self#::Core.Compiler.Const(foo, false)
  out::Array{Any,1}
Body::Array{Any,1}
1 ─     (out = Base.vect())
│       Main.push!(out, 1)
└──     return out

In this form, the return type is Vector{Any}. To avoid this, I could define out = Int[]. The element type is often more complicated than Int though, and is sometimes implicit or dependent on one of the input types, so explicitly naming it is fiddly.

One technique that works is using an array comprehension:

julia> function bar()
           out = [1]
           return out
       end
bar (generic function with 1 method)

julia> @code_warntype bar()
Variables
  #self#::Core.Compiler.Const(bar, false)
  out::Array{Int64,1}
Body::Array{Int64,1}
1 ─     (out = Base.vect(1))
└──     return out

But this isn’t always nice, especially if the logic in the loop is more complex.

A third approach is constructing the first element, then allocating a vector with typeof(first_element), and continuing. This usually infers fine, but is very awkward.

I have two questions:

would type inference on this kind of structure ever be possible (a sort of backwards-narrowing of the element type based on what actually gets put into it)? Or is this hard / breaking to do since the container escapes the function, and other elements of a wider type could be put onto that container later outside the function (so concluding a tighter type than the user declares would break those push!es)?
do people have better some style ideas for how to write these kind of functions, without explicitly naming the element type / keeping the function generic?

jling · July 23, 2020, 2:28pm

I think it depends on what is your actual use case is. If your function (with push!() in it) takes arguments, your can build an array with that type. Basically, [] can’t assume anything since you can push!() anything into it.

Benjia · July 23, 2020, 2:33pm

Yes, agreed, though in some cases the element type is not directly one of the arguments types, but instead is the inferred return type of some function call. Explicitly naming this type is tricky / impossible, especially if writing generic code.

jling · July 23, 2020, 2:33pm

oh you don’t need to pass the type as argument directly. For example, as you know, if it’s a scalar you can use typeof, if the argument is an Array you can use eltype. Also checkout empty() and zero(), they are both type-stable in the way you would use them.

pdeffebach · July 23, 2020, 2:41pm

push!! from BangBang.jl combined with Empty will do this

julia> x = Empty(Vector)
Empty{Array{T,1} where T}()

julia> push!!(x, 1)
1-element Array{Int64,1}:
 1

To be honest I wouldn’t mind something like this being in Base.

Though I wonder if, in general, you should be using map to accomplish this task instead of push!.

Benjia · July 23, 2020, 2:45pm

Thanks, that looks interesting, will read into it!

I agree that for some cases map is fine, but often it’s a bit more hairy and I’m doing some combination of mapping, filtering, etc in a performant way, so writing in an imperative style helps.

pdeffebach · July 23, 2020, 2:47pm

Relatedly, I would also like a feature in map where I could choose not to return a value at all. Some sort of sentinel type where, when map sees it returned from the function, it just doesn’t add to the vector.

jling · July 23, 2020, 4:06pm

just return missing and use skipmissing later.

Benjia · July 23, 2020, 4:30pm

That’s terrible for performance though - allocating & copying twice.

pdeffebach · July 23, 2020, 4:34pm

skipmissing doesnt allocate, so unless you want a vector and not an iterator it will be performant.

jling · July 23, 2020, 5:39pm

as @pdeffebach said, skipmissing doesn’t allocate by default, also I want to comment on ‘wishing map can skip elements’. Notice when you map, the length of the output array is known - same as the mapped over array. Potentially the type is also stable, this allows efficient parallelism potential. What you want (skip elements, causing the outcome array to be length-varying), is basically equivalent to push!() since you add elements into the array on a case by case basis (also that because order depends on what is missing what is not, it’s no longer trivially parallel)

Topic		Replies	Views
Ridiculous idea: types from the future Internals & Design inference	16	5685	April 6, 2018
Array construction of composite types failed to provide a consistent bound for its corresponding element type General Usage array , parametric-types , type-stability	7	92	November 9, 2024
Specifying output container type based on input container General Usage question	14	541	March 20, 2023
Type inference for a singleton of type Function New to Julia	7	844	September 13, 2020
How is the output type determined in map and friends? General Usage	6	584	August 27, 2018

Inferring loose container type based on elements that are added to it later?

Related topics