Regression in v0.7 due to weird inference failure for function with Lisp-y recursion

Consider this example, where foo works out a type given some inputs of other types:

mutable struct A{E, L}
end

mutable struct B{E, L}
end

mutable struct C{E, L}
end

foo(::Type{T}, t, ts...) where {T} = foo(foo(T, t), ts...)
foo(::Type{A{E, L}}, ::B{E2,L2}) where {E,E2,L,L2} = A{E2,L}
foo(::Type{A{E, L}}, ::C{E2,L2}) where {E,E2,L,L2} = A{E2,L2}

Let’s define some input objects of types B and C

b, c = B{Float64,2}(), C{Float64,2}()

Under current master the return type of the following call fails to be inferred and is very slow

julia> @btime foo(A{Float64,0}, $b, $c, $b, $b)
  258.510 ns (2 allocations: 32 bytes)
A{Float64,2}

However, a minuscule change to foo makes is completely inferrable

foo(::Type{A{E, L}}, ::C{E2,L2}) where {E,E2,L,L2} = A{E2,L}

which produces the following timings

julia> @btime foo(A{Float64,0}, $b, $c, $b, $b)
  0.023 ns (0 allocations: 0 bytes)
A{Float64,0}

I’ve been trying to figure out the reason all day today, and I cannot fathom the logic here. Something like this is killing the performance of my code. This problem arises in current master, but not in v0.6.1, where everything is nicely inferred and elided

julia> versioninfo()
Julia Version 0.7.0-DEV.4390
Commit 79c7bdd9ec (2018-02-26 07:59 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin17.4.0)
  CPU: Intel(R) Core(TM) i7-5775R CPU @ 3.30GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, broadwell)
Environment:

Is this known? Should I open an issue for this?
[EDIT: corrected typos]

I fail to see the difference between the two foo versions. Also you don’t define a.

Oops, apologies @mauro3. Two typos corrected!

1 Like

By the way, the mutable part is completely irrelevant. The regression also happens with immutable structs.

Interesting. I can reproduce.
Also worth pointing out that just like changing foo to treat B and C the same causes type stability, replacing the c in the function call with b is also type stable with the original foo.

julia> @code_warntype foo(A{Float64,0}, b, c, b, b)
Variables:
  t<optimized out>
  ts::Tuple{C{Float64,2},B{Float64,2},B{Float64,2}}

Body:
  begin
      (Core.getfield)(ts::Tuple{C{Float64,2},B{Float64,2},B{Float64,2}}, 1)::C{Float64,2}
      Core.SSAValue(4) = (Core.getfield)(ts::Tuple{C{Float64,2},B{Float64,2},B{Float64,2}}, 2)::B{Float64,2}
      Core.SSAValue(5) = (Core.getfield)(ts::Tuple{C{Float64,2},B{Float64,2},B{Float64,2}}, 3)::B{Float64,2}
      # meta: location REPL[4] foo 1
      Core.SSAValue(9) = $(Expr(:invoke, MethodInstance for foo(::Type{A{Float64,2}}, ::B{Float64,2}, ::B{Float64,2}, ::Vararg{B{Float64,2},N} where N), :(Main.foo), A{Float64,2}, Core.SSAValue(4), Core.SSAValue(5)))::Type{A{Float64,_1}} where _1
      # meta: pop location
      return Core.SSAValue(9)
  end::Type{A{Float64,_1}} where _1

julia> @code_warntype foo(A{Float64,0}, b, b, b, b)
Variables:
  t<optimized out>
  ts::Tuple{B{Float64,2},B{Float64,2},B{Float64,2}}

Body:
  begin
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},B{Float64,2}}, 1)::B{Float64,2}
      Core.SSAValue(4) = (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},B{Float64,2}}, 2)::B{Float64,2}
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},B{Float64,2}}, 3)::B{Float64,2}
      return A{Float64,0}
  end::Type{A{Float64,0}}

Curiously, switching the position of c to anywhere but 2nd was type stable:

julia> @code_warntype foo(A{Float64,0}, c, b, b, b)
Variables:
  t<optimized out>
  ts::Tuple{B{Float64,2},B{Float64,2},B{Float64,2}}

Body:
  begin
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},B{Float64,2}}, 1)::B{Float64,2}
      Core.SSAValue(4) = (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},B{Float64,2}}, 2)::B{Float64,2}
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},B{Float64,2}}, 3)::B{Float64,2}
      return A{Float64,2}
  end::Type{A{Float64,2}}

julia> @code_warntype foo(A{Float64,0}, b, c, b, b)
Variables:
  t<optimized out>
  ts::Tuple{C{Float64,2},B{Float64,2},B{Float64,2}}

Body:
  begin
      (Core.getfield)(ts::Tuple{C{Float64,2},B{Float64,2},B{Float64,2}}, 1)::C{Float64,2}
      Core.SSAValue(4) = (Core.getfield)(ts::Tuple{C{Float64,2},B{Float64,2},B{Float64,2}}, 2)::B{Float64,2}
      Core.SSAValue(5) = (Core.getfield)(ts::Tuple{C{Float64,2},B{Float64,2},B{Float64,2}}, 3)::B{Float64,2}
      # meta: location REPL[4] foo 1
      Core.SSAValue(9) = $(Expr(:invoke, MethodInstance for foo(::Type{A{Float64,2}}, ::B{Float64,2}, ::B{Float64,2}, ::Vararg{B{Float64,2},N} where N), :(Main.foo), A{Float64,2}, Core.SSAValue(4), Core.SSAValue(5)))::Type{A{Float64,_1}} where _1
      # meta: pop location
      return Core.SSAValue(9)
  end::Type{A{Float64,_1}} where _1

julia> @code_warntype foo(A{Float64,0}, b, b, c, b)
Variables:
  t<optimized out>
  ts::Tuple{B{Float64,2},C{Float64,2},B{Float64,2}}

Body:
  begin
      (Core.getfield)(ts::Tuple{B{Float64,2},C{Float64,2},B{Float64,2}}, 1)::B{Float64,2}
      Core.SSAValue(4) = (Core.getfield)(ts::Tuple{B{Float64,2},C{Float64,2},B{Float64,2}}, 2)::C{Float64,2}
      (Core.getfield)(ts::Tuple{B{Float64,2},C{Float64,2},B{Float64,2}}, 3)::B{Float64,2}
      return A{Float64,2}
  end::Type{A{Float64,2}}

julia> @code_warntype foo(A{Float64,0}, b, b, b, c)
Variables:
  t<optimized out>
  ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2}}

Body:
  begin
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2}}, 1)::B{Float64,2}
      Core.SSAValue(4) = (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2}}, 2)::B{Float64,2}
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2}}, 3)::C{Float64,2}
      return A{Float64,2}
  end::Type{A{Float64,2}}

I’ve encountered a problem before where incrementally calling parametric functions made them all type stable, but when jumping ahead they are not. However, I started over in a new session and called them in a different order, and that doesn’t seem to be going on here.
I once again got that specifically the order b, c, b, b is not inferable, but c in the other three locations is.

Throwing extra bs on the end makes it switch back and fourth:

julia> @code_warntype foo(A{Float64,0}, b, b, b, c, b)
Variables:
  t<optimized out>
  ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2}}

Body:
  begin
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2}}, 1)::B{Float64,2}
      Core.SSAValue(4) = (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2}}, 2)::B{Float64,2}
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2}}, 3)::C{Float64,2}
      Core.SSAValue(6) = (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2}}, 4)::B{Float64,2}
      return A{Float64,2}
  end::Type{A{Float64,2}}

julia> @code_warntype foo(A{Float64,0}, b, b, b, c, b, b)
Variables:
  t<optimized out>
  ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2}}

Body:
  begin
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2}}, 1)::B{Float64,2}
      Core.SSAValue(4) = (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2}}, 2)::B{Float64,2}
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2}}, 3)::C{Float64,2}
      Core.SSAValue(6) = (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2}}, 4)::B{Float64,2}
      Core.SSAValue(7) = (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2}}, 5)::B{Float64,2}
      return A{Float64,2}
  end::Type{A{Float64,_1}} where _1

julia> @code_warntype foo(A{Float64,0}, b, b, b, c, b, b, b)
Variables:
  t<optimized out>
  ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2},B{Float64,2}}

Body:
  begin
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2},B{Float64,2}}, 1)::B{Float64,2}
      Core.SSAValue(4) = (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2},B{Float64,2}}, 2)::B{Float64,2}
      Core.SSAValue(5) = (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2},B{Float64,2}}, 3)::C{Float64,2}
      Core.SSAValue(6) = (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2},B{Float64,2}}, 4)::B{Float64,2}
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2},B{Float64,2}}, 5)::B{Float64,2}
      (Core.getfield)(ts::Tuple{B{Float64,2},B{Float64,2},C{Float64,2},B{Float64,2},B{Float64,2},B{Float64,2}}, 6)::B{Float64,2}
      return A{Float64,2}
  end::Type{A{Float64,2}}

I have no idea what is going on.

Thanks @Elrod for reproducing. This matches my observations, inference goes a bit crazy here. I’ll wait for a bit and if nothing comes up I’ll open an issue.

https://github.com/JuliaLang/julia/issues/26224

Interesting! Jameson just commented on the Github issue above that this behaviour is by design. The compiler cannot demonstrate that the first recursive version of foo actually terminates and doesn’t enter into a recursion loop. Hence it bails out trying to infer its output type. If it didn’t do so we couldn’t have interprocedural constant propagation optimizations. Consequently, in v0.7+ one needs to be careful with this coding pattern of recursive methods. It is unfortunate, I find it rather elegant and powerful, and had gotten used to it from v0.6 and before. It does work very well if it is simple enough, for example

@inline tuplejoin(x) = x
@inline tuplejoin(x, y) = (x..., y...)
@inline tuplejoin(x, y, z...) = (x..., tuplejoin(y, z...)...)

but it becomes quickly unworkable as one makes it more complex. You have been warned! :smiley:

2 Likes

For completeness, I’ve been trying to find a tweak that makes the original recursive pattern workable. For some arcane reason, the following tweak, while still type-unstable and not-eliding, no longer allocates and recovers excellent performance

struct A{E, L}
end

struct B{E, L}
end

struct C{E, L}
end

foo(::Type{T}, t1, ts::Vararg{<:Any,N}) where {T, N} = foo(foo(T, t1), ts...)
foo(::Type{A{E, L}}, ::B{E2,L2}) where {E,E2,L,L2} = A{E2,L}
foo(::Type{A{E, L}}, ::C{E2,L2}) where {E,E2,L,L2} = A{E2,L2}

Note the Vararg{<:Any, N}. That’s the key for some reason. Now:

julia> b, c = B{Float64,2}(), C{Float64,2}()
(B{Float64,2}(), C{Float64,2}())

julia> @btime foo(A{Float64,0}, $b, $c, $b, $b)
  5.830 ns (0 allocations: 0 bytes)
A{Float64,2}

As you increase the number of b and c parameters, the execution time grows, but only mildly, in contrast to the original example which exploded. This is so whatever the input pattern, as far as I can tell. In other words, this version performs similarly to the tuplejoin example above, despite being type-unstable! It seems to be a good solution for even more complex recursions of this kind, also with more than three types A, B, C and pairwise rules.

Any light on why ts::Vararg{<:Any,N} is so different than ts... would be very welcome!

FWIW, you might “solve” these problems by putting @generated in front of foo (at least, that’s my go-to strategy on 0.6). This makes the compiler run the recursion at compilation-time.

Good one! Indeed, that works.

@generated foo(::Type{T}, t, ts...) where {T} = :(foo(foo(T, t), ts...))
@generated foo(::Type{A{E, L}}, ::B{E2,L2}) where {E,E2,L,L2} = :(A{E2,L})
@generated foo(::Type{A{E, L}}, ::C{E2,L2}) where {E,E2,L,L2} = :(A{E2,L2})
julia> @btime foo(A{Float64,0}, $b, $c, $b, $b)
  1.873 ns (0 allocations: 0 bytes)
A{Float64,2}

At some point in the past I loosely recall there was a recommendation to do things without generated functions if possible, but I don’t remember the reasons (compilation times, precompilation perhaps?). Do I remember correctly? If so, does the reason still apply in the new 1.0 world?

By the way, interestingly, the @code_native output using your @generated approach and the ts::Vararg{<:Any,N} approach is identical, and looks very compact to my untrained eye

julia> @code_native foo(A{Float64,0}, b, c, b, b)
	.section	__TEXT,__text,regular,pure_instructions
; Function foo {
; Location: REPL[4]:1
	subq	$40, %rsp
	movq	%rsi, 32(%rsp)
; Function @generated body; {
; Location: REPL[4]:1
; Function foo; {
; Location: REPL[4]:1
; Function @generated body; {
; Location: REPL[4]:1
	movabsq	$4546854976, %rax       ## imm = 0x10F038040
	movq	%rax, (%rsp)
	movabsq	$4563281216, %rax       ## imm = 0x10FFE2540
	movq	%rax, 8(%rsp)
	movabsq	$4546855192, %rax       ## imm = 0x10F038118
	movq	%rax, 16(%rsp)
	movq	%rax, 24(%rsp)
	movabsq	$jl_invoke, %rax
	movabsq	$4558629776, %rdi       ## imm = 0x10FB72B90
	leaq	(%rsp), %rsi
	movl	$4, %edx
	callq	*%rax
;}}
	addq	$40, %rsp
	retq
;}}

It’s interesting that it worked that way, but it’s not what I meant. I meant literally just putting @generated, without quoting the expression.

@generated foo(::Type{T}, t, ts...) where {T} = foo(foo(T, t), ts...)
...

Then the generated body of foo becomes a constant (i.e. the computed type).

1 Like

Mmm, nope, that does not work for me. foo(A{Float64,0}, b, c, b, b) just hangs if I don’t quote the result (master and 0.6.2)… I thought generated functions where supposed to return a quoted expression, right?

Yes, but you can returned a quoted constant. Typically this just indicates you are doing something inappropriate for generated function usage (unsound). For similar reasons, it’s often a bad idea to have a generated function call itself (it’s bad enough that inference already might get run recursively inside it and think that it might call itself, resulting in a StackOverflow and crash, but so far we’ve usually managed to avoid that).

2 Likes