Inferring loose container type based on elements that are added to it later?

I often have functions that look like this, where I’m allocating some output vector and pushing some homogeneously typed objects onto it:

julia> function foo()
           out = []
           push!(out, 1)
           return out
       end
foo (generic function with 1 method)

julia> @code_warntype foo()
Variables
  #self#::Core.Compiler.Const(foo, false)
  out::Array{Any,1}
Body::Array{Any,1}
1 ─     (out = Base.vect())
│       Main.push!(out, 1)
└──     return out

In this form, the return type is Vector{Any}. To avoid this, I could define out = Int[]. The element type is often more complicated than Int though, and is sometimes implicit or dependent on one of the input types, so explicitly naming it is fiddly.

One technique that works is using an array comprehension:

julia> function bar()
           out = [1]
           return out
       end
bar (generic function with 1 method)

julia> @code_warntype bar()
Variables
  #self#::Core.Compiler.Const(bar, false)
  out::Array{Int64,1}
Body::Array{Int64,1}
1 ─     (out = Base.vect(1))
└──     return out

But this isn’t always nice, especially if the logic in the loop is more complex.

A third approach is constructing the first element, then allocating a vector with typeof(first_element), and continuing. This usually infers fine, but is very awkward.

I have two questions:

  • would type inference on this kind of structure ever be possible (a sort of backwards-narrowing of the element type based on what actually gets put into it)? Or is this hard / breaking to do since the container escapes the function, and other elements of a wider type could be put onto that container later outside the function (so concluding a tighter type than the user declares would break those push!es)?
  • do people have better some style ideas for how to write these kind of functions, without explicitly naming the element type / keeping the function generic?

I think it depends on what is your actual use case is. If your function (with push!() in it) takes arguments, your can build an array with that type. Basically, [] can’t assume anything since you can push!() anything into it.

Yes, agreed, though in some cases the element type is not directly one of the arguments types, but instead is the inferred return type of some function call. Explicitly naming this type is tricky / impossible, especially if writing generic code.

oh you don’t need to pass the type as argument directly. For example, as you know, if it’s a scalar you can use typeof, if the argument is an Array you can use eltype. Also checkout empty() and zero(), they are both type-stable in the way you would use them.

push!! from BangBang.jl combined with Empty will do this

julia> x = Empty(Vector)
Empty{Array{T,1} where T}()

julia> push!!(x, 1)
1-element Array{Int64,1}:
 1

To be honest I wouldn’t mind something like this being in Base.

Though I wonder if, in general, you should be using map to accomplish this task instead of push!.

1 Like

Thanks, that looks interesting, will read into it!

I agree that for some cases map is fine, but often it’s a bit more hairy and I’m doing some combination of mapping, filtering, etc in a performant way, so writing in an imperative style helps.

Relatedly, I would also like a feature in map where I could choose not to return a value at all. Some sort of sentinel type where, when map sees it returned from the function, it just doesn’t add to the vector.

just return missing and use skipmissing later.

That’s terrible for performance though - allocating & copying twice.

skipmissing doesnt allocate, so unless you want a vector and not an iterator it will be performant.

1 Like

as @pdeffebach said, skipmissing doesn’t allocate by default, also I want to comment on ‘wishing map can skip elements’. Notice when you map, the length of the output array is known - same as the mapped over array. Potentially the type is also stable, this allows efficient parallelism potential. What you want (skip elements, causing the outcome array to be length-varying), is basically equivalent to push!() since you add elements into the array on a case by case basis (also that because order depends on what is missing what is not, it’s no longer trivially parallel)