Capture `DataType` in closure

Why is it apparently not possible to capture a DataType inside a closure in a typestable manner? Consider

function create_closure1(data)
    bufT = eltype(data)
    (i) -> begin
        buf = Vector{bufT}(undef,1)
        buf[1] = data[i]
    end
end
closure = create_closure1(data)
@code_warntype closure(1) # type could be infered

however if i capture the data itself eltype(data) inside the closure works fine

function create_closure2(data)
    (i) -> begin
        buf = Vector{eltype(data)}(undef,1)
        buf[1] = data[i]
    end
end
closure = create_closure2(data)
@code_warntype closure(1) # type could be infered

In reality my problem is a bit more complicated because the buffer type is conditional. I solved the problem for now by creating an empy container of that type rather than using bufT in the clousre directly but this somehow feels hacky.

function conditional_example(data, hasmissing)
    bufT = hasmissing ? Union{Missing, eltype(data)} : eltype(data)
    buf_prototype = Vector{bufT}(undef,0)
    (i) -> begin
        buf = Vector{eltype(buf_prototype)}(undef,1)
        buf[1] = data[i]
    end
end
closure = conditional_example(data, true)
@code_warntype closure(1) # type could be infered

Any insights on that? Is it a bug? Or am I just missing something? I’ve played around with quite a lot of variations using let blocks before finding the stupid buf_prototype solution…

Since you are indexing into data, I reckon it is an array. Then why not define

function create_closure1(data::AbstractArray{T,N}) where {T,N}
   (i) -> begin
       buf = Vector{T}(undef,1)
       buf[1] = data[i]
   end
end
julia> closure = create_closure1([1,2])
#17 (generic function with 1 method)

julia> @code_warntype closure(1)
MethodInstance for (::var"#17#18"{Int64, Vector{Int64}})(::Int64)
  from (::var"#17#18"{T})(i) where T in Main at REPL[40]:2
Static Parameters
  T = Int64
Arguments
  #self#::var"#17#18"{Int64, Vector{Int64}}
  i::Int64
Locals
  buf::Vector{Int64}
Body::Int64
1 ─ %1 = Core.apply_type(Main.Vector, $(Expr(:static_parameter, 1)))::Core.Const(Vector{Int64})
│        (buf = (%1)(Main.undef, 1))
│   %3 = Core.getfield(#self#, :data)::Vector{Int64}
│   %4 = Base.getindex(%3, i)::Int64
│        Base.setindex!(buf, %4, 1)
└──      return %4

julia> closure = create_closure1([1,missing])
#17 (generic function with 1 method)

julia> @code_warntype closure(1)
MethodInstance for (::var"#17#18"{Union{Missing, Int64}, Vector{Union{Missing, Int64}}})(::Int64)
  from (::var"#17#18"{T})(i) where T in Main at REPL[40]:2
Static Parameters
  T = Union{Missing, Int64}
Arguments
  #self#::var"#17#18"{Union{Missing, Int64}, Vector{Union{Missing, Int64}}}
  i::Int64
Locals
  buf::Vector{Union{Missing, Int64}}
Body::Union{Missing, Int64}
1 ─ %1 = Core.apply_type(Main.Vector, $(Expr(:static_parameter, 1)))::Core.Const(Vector{Union{Missing, Int64}})
│        (buf = (%1)(Main.undef, 1))
│   %3 = Core.getfield(#self#, :data)::Vector{Union{Missing, Int64}}
│   %4 = Base.getindex(%3, i)::Union{Missing, Int64}
│        Base.setindex!(buf, %4, 1)
└──      return %4

The reason the original example doesn’t work, is that at inference time, only type information is available. bufT may hold for example the value Float64, but its type is DataType.
The second example works, because the element type of data is part of its type, e.g. Vector{Float64}. That information is available at inference time.

1 Like

The provided code is just a MWE to showcase the behavior of the compiler, the actual problem does not look like this. In my real code, the function which creates the closure is inherently typeunstable and the type of the buffer (aka the return value of the function) can not be staticially infered. However it can be infered at creation time of the closure (similar to what i’ve shown in the last code snippet conditional_example), this dynamicially infered buffer type is then used in the closure to make the anonymous function typestable and fast (I don’t care about the performance of create_closure since this is only applied once, the closure itself will be applied thousands of times)

The reason the original example doesn’t work, is that at inference time, only type information is available. bufT may hold for example the value Float64, but its type is DataType.
The second example works, because the element type of data is part of its type, e.g. Vector{Float64}. That information is available at inference time.

Hm interesting! I thougth at runtime bufT would be of type Type{Int64} rather than DataType but that kinda explains it…

You can sneak the type info in though:

function create_closure2(data)
     bufT = Val(eltype(data))
     function (i; bufT::Val{T}=bufT) where {T}
         buf = Vector{T}(undef,1)
         buf[1] = data[i]
     end
 end

I think you hit the nail on the head that the closure only sees bufT::DataType, but bufT does seem inferable in @code_warntype create_closure1([0 0; 0 0]) (OP’s version), you can see the bufT::Type{Int64} and bufT::Core.Compiler.Const(Int64, false). This seems based in the compiler’s ability to compute constants from type information; for example, I can edit in a N = ndims(data) and can get the closure to capture N’s value at compile-time.

I think the reason that the closure only sees bufT::DataType is because the underlying struct automatically annotates concrete types, which is the most specific type except for a type of types. Wonder if this could be patched, the compiler already seems capable of leveraging abstract ::Type{_T} annotations:

struct X  T::DataType   end
struct Y  T::Type{Int}  end
f(x::X) = Vector{x.T}(undef, 1)
f(y::Y) = Vector{y.T}(undef, 1)

@code_warntype f(X(Int)) # unstable
@code_warntype f(Y(Int)) #   stable

Thanks for the replies for sure i’ve learned something about DataType and Type{T}! I guess I’ll go with capturing a undef Ref{_} in the closure.

I don’t think it has anything todo with inferability of the outer function though, since the field types of the underlying closure-struct are fixed at runtime rather than inference time as can be seen in this example:

function create_closure()
    T = rand([Int, Float64])
    ref = Ref{T}()
    () -> rand(eltype(ref))
end
@code_warntype create_closure()   # unstable, infers T::DataType, cannot infer type of ref
@code_warntype create_closure()() # correctly captures ref::Ref{Int} -> typestable

But if it is resolved at runtime, why not annotate the capure field as Type{_} rather than
DataType? As far as i understand at runtime this is should be allways possible the same way that Ref{_} can be resolved in the example above. Bennys example shows that it would help the inferability of the closure.

Its probably because

struct Container{T} x::T end
typeof(Container(Int))

returns Container{DataType} instead of Container{Type{Int64}} and I guess there are good reasons for that.

1 Like

Deriving the abstract Type{T} from a type T is generally wasteful. [Int, Float64] automatically has the element type DataType, and methods on those elements only need to be compiled for and work on DataType instances. Compiling and runtime-dispatching methods for an unfixed number of types is unnecessary and wastes time.

Type{T} and where T comes in handy when you work with a fixed number of types and the extra information at compile-time helps type stability. It just seems to me that since the compiler already knows bufT is a constant based on data, it could also decide to annotate the field in the closure with a Type{typeof(bufT)}. But at this point it seems you have to manually make such structs struct X{T} ty::Type{T} end or more simply have your closures capture non-type instances like data in your first example. I don’t think it’s a performance hit to compute constants like eltype or ndims in your closure, I think simple constant computation is all done at compile-time.