ReverseDiff.jl failing with the Base.tanh function?

Hello Guys,

I’m trying to build a probabilistic model in Turing.jl, and it is supposed that when you change the AD backend to :reversediff you get computational improvements. I have a forward operator inside my probabilistic model that had a tanh function. When running the code, ReverseDiff, would fail at this line.

I replaced the Base.tanh definition to a custom definition using the Pade approximation:

tanh_pade(x) = x * (2027025 + 270270 * x^2 + 6930 * x^4 + 36 * x^6) / (2027025 + 945945 * x^2 + 51975 * x^4 + 630 * x^6 + x^8)

and it would fail again with the following error:

ERROR: ArgumentError: Converting an instance of ReverseDiff.TrackedReal{Float64, Float64, Nothing} to Int64 is not defined. Please use `ReverseDiff.value` instead.
Stacktrace:
  [1] convert(#unused#::Type{Int64}, t::ReverseDiff.TrackedReal{Float64, Float64, Nothing})
    @ ReverseDiff ~/.julia/packages/ReverseDiff/UJhiD/src/tracked.jl:261
  [2] _cpow(z::Complex{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, p::Complex{ReverseDiff.TrackedReal{Float64, Float64, Nothing}})
    @ Base ./complex.jl:791
  [3] ^
    @ ./complex.jl:859 [inlined]
  [4] ^
    @ ./promotion.jl:444 [inlined]
  [5] ^
    @ ./complex.jl:864 [inlined]
  [6] literal_pow
    @ ./intfuncs.jl:338 [inlined]
  [7] tanh_pade(x::Complex{ReverseDiff.TrackedReal{Float64, Float64, Nothing}})
    @ Main ~/AQ_research/Bayes/DRIVER_BAYES.jl:38
  [8] MT_mod_wait_bayes_cells(depth::Vector{Float64}, rho::Vector{ReverseDiff.TrackedReal{Float64, Float64, Nothing}}, fa::Vector{Float64})
    @ Main ~/AQ_research/Bayes/DRIVER_BAYES.jl:17

and redefining the custom tanh to:

function tanh_pade(x)
    y= x * (2027025 + 270270 * x*x + 6930 * x*x*x*x + 36 * x*x*x*x*x*x) / (2027025 + 945945 * x*x + 51975 * x*x*x*x + 630 * x*x*x*x*x*x + x*x*x*x*x*x)
    return y
end

would work but, obviously It made the code incredibly slow. At this point I’m not able to see the advantages of using ReverseDiff.jl. By using ForwardDiff.jl, I can use Base.tanh and the code is at least 3 times faster, which is the opposite to what is expected when using Turing.jl.

To summarize, the last custom function works, but slower than just using ForwardDiff. Anyone has any idea of what could be happening?

Thanks

What happens if you use evalpoly? Writing out polynomials like this is pretty inefficient.

The performance of your custom tanh can be fixed with a custom adjoint. The ReverseDiff docs explain how to do this.

But I’d be more worried about ReverseDiff failing with tanh. In fact I doubt this is the problem. Your error message suggests something else is wrong. If you produce a MWE then people here will likely be able to help.

This line in your stacktrace looks suspicious:

It seems you work with Complex numbers but functions with complex arguments are neither supported nor tested in ReverseDiff AFAICT (similarly, I haven’t encountered any support or tests for them in Turing).

Hello Oscar, Thanks for your comments. I tried with these two functions:

function tanh_pade(x)
    return x * (2027025 + 270270 * x*x + 6930 * x*x*x*x + 36 * x*x*x*x*x*x) / (2027025 + 945945 * x*x + 51975 * x*x*x*x + 630 * x*x*x*x*x*x + x*x*x*x*x*x*x*x)
end

function tanh_poly(x)
    return x*evalpoly(x,[2027025, 0, 270270, 0, 6930, 0, 36])/evalpoly(x,[2027025, 0, 945945, 0, 51975, 0, 630, 0, 1])
end

c=rand(10000000)
@time begin
    tanh_pade.(c)
end
@time begin
    tanh_poly.(c)
end

And the timing results are:

  0.115309 seconds (46.13 k allocations: 79.196 MiB, 5.37% gc time, 23.71% compilation time: 100% of which was recompilation)
  1.917526 seconds (20.06 M allocations: 2.313 GiB, 24.13% gc time, 3.10% compilation time: 100% of which was recompilation)

Unfortunately there is no improvement with the new version. Thank you for the suggestion!

This is going to be slow because (a) you are allocating an array on the heap just to pass to evalpoly and (b) the evalpoly function can’t inline/unroll because it doesn’t statically know the number of coefficients. Pass a tuple instead, as recommended in the evalpoly docs:

function tanh_poly(x)
    return x*evalpoly(x,(2027025, 0, 270270, 0, 6930, 0, 36))/evalpoly(x,(2027025, 0, 945945, 0, 51975, 0, 630, 0, 1))
end

which gives (using BenchmarkTools.jl):

julia> @btime tanh_pade($(Ref(0.1))[]);
  5.732 ns (0 allocations: 0 bytes)

julia> @btime tanh_poly($(Ref(0.1))[]);
  4.198 ns (0 allocations: 0 bytes)

so it is a bit faster than tanh_pade. (It evaluates the polynomial using Horner’s method, which requires fewer multiplies than writing out each monomial separately.)

Even better, rather than having every other coefficient be zero, you should express it as a polynomial in x*x:

function tanh_poly2(x)
    return x*evalpoly(x*x,(2027025, 270270, 6930, 36))/evalpoly(x*x,(2027025, 945945, 51975, 630, 1))
end

which is even faster:

julia> @btime tanh_poly2($(Ref(0.1))[]);
  3.634 ns (0 allocations: 0 bytes)

PS. For more discussion of why inlining/unrolling of polynomial evaluation is important, and how it is done in Julia, see this JuliaCon 2019 talk.

1 Like