Sniffing the return type

It should indeed be avoided if possible, I don’t argue with the docstring.
However, promote_op is in the docs, they recommend using it exactly to compute output eltypes: Methods · The Julia Language.

1 Like

A docstring does not make API:

https://docs.julialang.org/en/v1/manual/faq/#How-does-Julia-define-its-public-API%3F

As far as I can tell, promote_op is not in the documentation.

It is in the docs though, with specific usage recommendations. See my previous reply just above yours.

… used as part of a very specific example involving type promotion of other argument types, not as a documented function in its own right. The use here would not be the same as that example, since there are no dependencies other than the function itself.

This is already unfeasible because func is annotated with the abstract Function, so Deferred{T} could hold any function. Even if you were to parameterize Defered{F<:Function,T} for the specific function type EDIT: Scratch all that, you probably intend each Deferred{T} to work for all functions with a particular return type, which does reduce the number of concrete Deferred types to compile methods for, despite needing a runtime check of .func to compute .item.

For other cases, it’s more typical that type parameters are written in the inner constructor name rather than the argument so that instantiation can still use curly brace syntax. In plainer language, if the struct is struct Blah{F<:Function, T}, you write the inner constructor as function Blah{F,T}(f) rather than function Blah(f, ::Type{F}, ::Type{T}). That said, though, I think you’re actually working with somewhat of an exception.

See, you don’t want to give T as a parameter or an argument when instantiating Deferred, you want T to be inferred from and thus always match the function. Since inner constructors are called by any outer constructors, an inner constructor is a good place to enforce such “invariants”/constraints. In plainer language, a function Deferred(f::Function) can infer T from f, then construct new{T}. A function alone is normally not enough to determine a return type, but you implicitly specify func is called with 0 arguments in the pc::Deferred method, so you could also just do that in the Deferred(f) constructor.

func() may not be type-stable. Quick example, the return type of which0() = zero((Int8, Int16, Int32, Int64, Float32, Float64)[rand(1:6)]) is inferred as Any. That’s not a dealbreaker, a RefValue{Any} can hold any of those types, but inferring an abstract T can’t let the compiler optimize well. Unless you’re very committed to categorizing functions by implementation-dependent (read: not a language guarantee) inferred return types, it could be simpler to store lazy computations separately from their function+arguments.

promote_op is not what you want. _return_type is. Or just Core.Compiler.return_type

2 Likes

I wouldn’t recommend relying on inference results in the design of your code. Julia functions simply don’t have any return type guarantees, so you’re fighting the semantics of the language if you try to use internal inference results.

Even if you write f()::Float64, that just means that Julia will attempt to do an explicit conversion of the output of f() to Float64, which might result in a conversion error being thrown.

An alternative approach would be to overload getproperty so that the computation only occurs when you access a property of a struct. Here’s an example:

mutable struct Demo
    x::Union{Nothing, String}
    y::Union{Nothing, Int}
end

Demo() = Demo(nothing, nothing)

_x(d::Demo) = getfield(d, :x)
_y(d::Demo) = getfield(d, :y)

fx() = (sleep(1); "hello")
fy() = (sleep(1); 42)

function Base.getproperty(d::Demo, p::Symbol)
    if p == :x
        x = _x(d)
        if isnothing(x)
            x = fx()
            d.x = x
        end
        x
    elseif p == :y
        y = _y(d)
        if isnothing(y)
            y = fy()
            d.y = y
        end
        y
    else
        # This will throw the usual error for
        # a wrong field name.
        getfield(d, p)
    end
end
julia> d = Demo()
Demo(nothing, nothing)

julia> d.x
"hello"

julia> d.y
42

julia> d.z
ERROR: type Demo has no field z
1 Like

Why does everyone keep saying this. Broadcast, one of the most common things we use depends on the results of inference. Its clearly a useful thing.

I think what they mean is that it’s riskier for users. Developers have a lot of freedom to change internal names and behaviors when developing a minor version, it’s just their job to make sure the changes don’t make too many bugs before they release the version. On the other hand, users using internals will be blindsided by a new version release and have to play catch-up to learn all the ways their code is no longer compatible with the new version. Not relying on internal stuff means that a user’s code can remain compatible with many more minor versions.

Looking at what @aplavin linked, it does appear that the more “correct” promote_op is suggested over relying on type inference because it “is very brittle (as well as not being optimizable” (though the section still very much reads as “this is digging into internals as a last resort”). Looking up Github issues, it seems the docstring was a half-measure for documenting a method that people weren’t entirely happy with but hadn’t found a better alternative.

promote_op is not so brittle, it’s being used in many base functions and standard libs.

https://github.com/JuliaLang/julia/search?q=promote_op

I don’t get what the problem is.

This part. You’re probably thinking that the Base code using promote_op have been using it for many versions by now, surely it’s stable. Is it? The developers can change that code and even delete promote_op for the next minor version as they please, and they never promised the users that they wouldn’t. The documented API is what they promise, and promote_op is just not there. Who knows, maybe the developers eventually figure out an alternative they’re happy enough with to export and document in the API, and they find-all-replace promote_op in the Base code with it.

It has been argued that the broadcasting infrastructure is essentially “cheating” by using the results of type inference. In other words, broadcasting focuses more on convenience than correctness*.

The map/reduce/foldl infrastucture, on the other hand, has been developed with more of an eye toward correctness. That’s why the init keyword argument was added to sum/prod/maximum/minimum. Without the init keyword, it’s impossible to know what type of zero to return from sum(f, itr) when itr is empty, because it depends on the return type of f.

julia> sum(x -> 2x, Int[])
ERROR: ArgumentError: reducing over an empty collection is not allowed

julia> sum(x -> 2x, Int[]; init=0.0)
0.0

See this Github comment and the discussion within that issue for more details.

*I should clarify that I don’t mean that broadcasting has “correctness bugs”, just that it plays a little fast and loose with type inference on occasion.

2 Likes

This is an interesting approach. The burden has been transferred from the field variable to getproperty(…).

I’ve learned a lot from this conversation. The core fact seems to be that predicting the return type is as @Sukera (and others say) “fundamentally not a stable operation.” It seems that it is done in various places in the core language (broadcast being one) but is generally a bad idea. However, maybe I was making this much more complex than I needed. As @Henrique_Becker asks “why do not use an Any field and function barriers.” This allows the compiler at runtime to determine the result’s type and is “efficient enough.” My semi-final (aka current best) implementation is

struct Deferred
    func::Function
    item::Base.RefValue{Any}

    function Deferred(f::Function) 
        @assert applicable(f) "The function passed to the Deferred constructor must take exactly zero-arguments."
        new(f, Base.RefValue{Any}())
    end
end

function (df::Deferred)()
    if !isassigned(df.item)
        df.item[] = df.func()
    end
    return df.item[]
end

Sometimes the simple solution is the best???

I don’t think I would have gotten here without all the insights.

1 Like

If I am not wrong, broadcast may discard the initially allocated Vector and allocate a new one if the actual computed elements “disagree” with the inference, no? I remember something about generating the first element to check the type and re-allocating and copying if a later element had an incompatible type. Or was this abandoned in a (now) old version?

Lets not speculate on how broadcast works, when we can read the code so readily.

If Base._return_type returns a concrete type, its allocating an array of that type and proceeding because _return_type can be trusted. If its an abstract type, it’s doing something like you say, but not for the reasons you may imagine.

If you have an algorithm you know is type stable for all possible inputs, you can use Base._return_type
without concern. If it’s abstract, you have the widest possible type right?. Broadcast reads the first value in an attemp to narrow the type.

If we are going to use Any otherwise, I still really don’t see where the issue is with using _return_type. You are rejecting good enough for not being perfect, and taking the worst possible outcome instead.

1 Like

I concede I was lazy to not check how exactly the broadcast machinery works. However, I fail to see what does this makes a difference for the caveats I mentioned before (and one new):

  1. You cannot semantically depend on the inferred type.
  2. Your code can break between minor Julia versions (if the Core devs decide to change its name or some other detail).
  3. You may end up generating multiple Dispatch{T} structures, what may end up again causing type instability (negating the value of _return_type) and adding compilation time (multiple specializations for different Dispatch{T}).

The _return_type has the potential to be better for performance, but has no real guarantee, and for the specific case (delaying evaluation of costly functions that may never be run) I would either go with the simple opaque Dispatch (and not cause extra compilations for something that may never be evaluated) or with a Dispatch{T} with T manually chosen. These would be my choice of trade-offs, not using _return_type, you preferred trade-offs may differ.

2 Likes

I think type inference can even be modified by patches (the z in version x.y.z), and people love it when type inference improves. It usually means their code starts running faster for free, or it means they can write faster code in easier ways than before.

It becomes a problem when the code changes/breaks because you were depending on inference to stay a certain way, like relying on functions f and g to both correspond to the same concrete Deferred{T} you put in an Array.

I think if you want to prohibit type instabilities in your code with this approach, it should be ok to use _return_type but error out if the type is not concrete, depending on what your use cases are of course. I can imagine situations where you are sure all correct uses of your function should produce concrete types in some location. Then you would be pretty safe from behavior changes in the future as type inference shouldn’t get worse over time for already type stable functions.

1 Like