Counting iterations to a type fixpoint

I have a function that typically gives a value of a different type, but repeated application eventually hits a fixpoint. Say the function is called next. I need the fixpoint in order to statically build a loop like

@generated function main_function(a, depth::StaticInt{n}) where {n}
    nsteps = max(n, 0)
    quote
        $(Expr(:meta, :inline))
        Base.Cartesian.@nexprs $nsteps i -> begin
            result += foo(a)
            a, b = b, next(b)
        end
        return result
    end
end

For a toy version of the problem, we can write

struct A{N} end

@inline next(::A{0}) = A{0}()
@inline next(::A{N}) where {N} = A{N-1}()

and depth can be something like

@inline function depth(ÎĽ::M) where {M}
    return depth(ÎĽ, next(ÎĽ), static(0))
end

@inline function depth(μ::M, β::M, s::StaticInt{N}) where {M,N}
    s
end

@generated function depth(μ::M, β::B, ::StaticInt{N}) where {M,B,N}
    s = Expr(:call, Expr(:curly, :StaticInt, N + 1))
    quote
        $(Expr(:meta, :inline))
        depth(β, next(β), $s)
    end
end

But this doesn’t work great:

julia> using Test

julia> for n in 1:5
           @inferred depth(A{n}())
       end
ERROR: return type StaticInt{3} does not match inferred return type Any
Stacktrace:
 [1] error(s::String)
   @ Base ./error.jl:33
 [2] top-level scope
   @ ./REPL[122]:2

For the real application, I have this working up to a depth of 4 ,but 5 fails in the same way as this. I need this to be fully static, or else main_function will be slower than building the loop dynamically. Also, in this case I could leverage the structure of A, but that’s fake anyway and doesn’t work for the real problem.

I also tried Cthulhu, which gave me messages like

[constprop] Disabled by argument and rettype heuristics
[constprop] Disabled by entry heuristic (unimprovable return type)
[constprop] Bounded recursion detected. Call was widened to force convergence. 

But I don’t see any way to get more detail on these, or information about how to fix the problem.

Any ideas?

1 Like

Probably stupid ones.

  • If type level programming isn’t any good in this case, maybe staged programming is?
  • Changing the inference/optimization parameters of Julia.

I recommend using functions. Functions are more powerful than macros or @generated functions for solving problems like this. See tail-call function-barrier pattern I discussed in

(At some previous versions of Transducers.jl, you could just write a simple loop FLoops.jl and the accumulator automatically converged to the fixpoint. But it is an opt-in feature ATM for “prelude” accumulator to reduce compile overhead.)

A generalized version of this is used in

2 Likes

Thanks @goerch and @tkf

I’m using generated functions and GeneralizedGenerated.jl for some things already, but in my experience staged programming is a special case of type-level programming – both of these require working at the type level. But maybe this isn’t what you mean?

In the command line options here, I only see -O that would be relevant. Is there something undocumented that says to crank up type inference?

This is how I was originally doing things, but for most cases the dynamic loop had a lot of overhead, so I found the codegen version to be a lot faster. The exception to this is cases where it can’t statically figure out the depth. I don’t see yet how your links can help with this, but I’ll read them more closely. BTW I recently tried FLoops in another part of the code, and it seems to work very well. The ability to change the executor is really slick.

Right, I probably should’ve provided a more direct link. The last section of the first link is likely most relevant: https://juliafolds.github.io/Transducers.jl/dev/explanation/state_machines/#tail-call-function-barrier

The earlier sections illustrate that Transducers.jl have problems similar to the one illustrated in the OP.

In my understanding staged programming is a generalization of type level programming, see.

No, I thought more about changing internal parameters.

Do you mean static unrolling of the T1 to TN sections? I think I’m missing something, since to unroll this still requires statically knowing N.

In this section, you seem to have a fixed limit to reaching the desired type before falling back on the “bulk” method. Is that what you mean?

That seems useful, but I don’t know of any documentation for using it. Are there examples somewhere?

I tried to convey that unrolled snippet is just a “pseudo code” for explaining the mechanism, by noting that:

Note that this code snippet is only for explaining the concept. For example, it is not possible to know how many while loops to generate a priori.

Transducers.jl does support the case where unknown numbers of steps are required for reaching fixpoints.

The fixed limit is not essential at all for understanding this (although you do need it unless you can guarantee that the fixpoints always exist). Rather, the point is that you can “unroll” (expand) the “type transitions” using recursion. So, focus on _foldl_linear_rec, not _foldl_linear_bulk.

So if I’m understanding right, this would mean replacing my main_function

with something like

@inline main2(a) = main2(a, next(a), 0.0)
@inline main2(a::T, b::T, acc) = acc

@inline function main2(a::Ta, b::Tb, acc)
    Δacc = foo(a)
    main2(b, next(b), acc + Δacc)
end

Is that right? This was my original approach, but I changed to generated code because I wasn’t getting type stability. Now (with the generated code approach) I mostly do, but it still fails when it gets too deep.

I find it very strange that it didn’t work, especially in a very simple setting where (presumably) you know that the type transition occurs at every step. But, it doesn’t work, maybe a reasonable approach is to use if-isa to short-circuit:

Base.Cartesian.@nexprs 10 i -> begin  # 10 is just some "big enough" number
    result += foo(b_{i-1})
    b_{i} = next(b_{i-1})
    if b_{i} isa typeof(b_{i-1})
        return result, b_{i}  # or other kind of short-circuiting
    end
end

(It is used to be better to use a nested if branch for something like this but I haven’t extensively tried them if it matters in recent Julia, after Shuhei improved the inference a lot recently.)

This doesn’t require @generated and the compiler will DCE the unused expanded code. It’s a rather stupid approach but probably is least terrible. It’s also straightforward to implement graceful degradation by attaching a simple loop after the bounded unrolling.

2 Likes

Ooh, that’s a neat trick. I’ll try it out, thanks!

Update: I had thought it would still be more efficient to compute the depth this way and use that to generate static code, but I’m not seeing any difference. Works great, and makes things much simpler! Thanks again @tkf

I guess we share the same feeling that “there has to be a better way…” But until we find something else, I guess it’s nice that we have something that works even though my code is very ugly. Anyway, good to know that it works for your use case! :slight_smile: If enough people do similar things, maybe it motivates adding more direct support for this in the language.

1 Like