Turing.jl, differentiation & categorical outputs -- `isprobvec` conundrum

delehef · January 27, 2021, 9:28pm

Hi all,

I met a strange bug, and, to be honest, I’m relatively new to the language, so it could be a mistake from my part.

So, I’m having a real blast playing with bayesian NNs, and I’m slowly building up from the basic example given on the Turing.jl website, while grabbing as much of an understanding of Julia inner working as I can. Now, I’m trying to extend the classification from being binary to multiclass. Thus, I defined the output of my BNN as a Categorical instead of a Binomial.

Logically enough, Categorical requires its samples to be probability vector positive and summing to one, i.e., from Distributions/src/utils.jl:

isprobvec(p::AbstractVector{<:Real}) =
    all(x -> x ≥ zero(x), p) && isapprox(sum(p), one(eltype(p)))

I convert the output of the last 3 neurons of my network to a probability vector with Flux.softmax, and, naturally, it’s not exactly 1.0 due to floating points shenanigans, but isapprox takes care of that.

The problem however, is that what works with floats doesn’t seem to work with TrackedReal (which are instrumental to AD):

xx = ReverseDiff.TrackedReal{Float64,Float64,ReverseDiff.TrackedArray{Float64,Float64,2,Array{Float64,2},Array{Float64,2}}}[TrackedReal<4WI>(0.42766067643153494, 0.0, 3Wk, ---), TrackedReal<FB4>(0.1639909617549507, 0.0, 3Wk, ---), TrackedReal<49W>(0.4083483618135143, 0.0, 3Wk, ---)]
sum(xx) = TrackedReal<KQ5>(0.9999999999999999, 0.0, 3Wk, ---)
one(eltype(xx)) = TrackedReal<HSG>(1.0, 0.0, ---, ---)
isapprox(sum(xx), 1.0) = true
isapprox(sum(xx), one(eltype(xx))) = false
isprobvec(r[:, i]) = false

So xx, the output of my network for a single sample, is indeed a “very close to sum-to-one” vector of TrackedReal (xx = [0.427..., 0.163..., 0.408...]); its sum is a “very-close-to-one” TrackedReal (sum = TrackedReal(0.9999999999999999, ...)), it is virtually equal to Float64(1.0) (isapprox(sum(xx), 1.0) = true), but, it’s not very close to the one of its type: isapprox(sum(xx), one(eltype(xx))) = false. Hence, it’s not a probability vector, and it is rejected by Categorical.

Here is a minimal failing program from which the above example is extracted; if you run it as is under Julia 1.5, you should encounter my problem

So did I miss something and are tracked reals failing to compare ≃ to one a normal behaviour, or is there a bug somewhere?

Thank you for your insights!

Christopher_Fisher · January 28, 2021, 11:28am

I’m not sure why it does not return true for TrackedReal. @trappmartin, do you have any insights?

Until this issue is resolved, you might be able to use Float64 as your point of comparison.

trappmartin · February 1, 2021, 11:33am

This is strange. Would you mind reporting this on ReverseDiff.jl? I doubt this has anything to do with Turing at all.

As an alternative, I suggest you simply write down the log likelihood function, e.g. in terms of the negative cross entropy, and increment the log joint by hand. This will also be faster.

Example using cross entropy:

@model function BNN(x, y)
    ...
    yhat = softmax.(...)
    Turing.@addlogprob! -Flux.crossentropy(yhat, y)
end

ElOceanografo · February 5, 2021, 10:50pm

I am running into this same issue with a Multinomial distribution (thought I was going crazy till I found this thread!). I get variations on the same error using ReverseDiff, ForwardDiff, Zygote, and Tracker as the backend, so it may be a more general issue than just ReverseDiff…

delehef · February 6, 2021, 3:51pm

@Christopher_Fisher @trappmartin

Thanks for the answers! Inded, the issue was with ReverseDiff.jl. I opened an issue and a PR that should fix the problem.

@ElOceanografo the fix will hopefully be merged in an official ReverseDiff release. In the meantime, you can use my fork (rm ReverseDiff; add https://github.com/delehef/ReverseDiff.jl.git#master) or just apply the fix yourself. I didn’t meet any issue with the other diffs method yet; but please tell me if it makes at least ReverseDiff works for you!

ElOceanografo · February 8, 2021, 6:12pm

Thanks, @delehef. For now, I ended up writing my own multinomial logpdf function with no sum-to-one checking, but will watch for the PR to be merged!

Topic		Replies	Views
Bayesian Neural Network (Multiclass Classification) in Turing.jl and Flux.jl Statistics question	5	3801	May 9, 2019
Numerical errors in logit normal model using Turing.jl Probabilistic Programming question , turing	27	3834	November 9, 2019
Help with first non-trivial Turing example Statistics	17	1366	May 7, 2020
Custom likelihoods in Turing.jl General Usage	15	3790	October 26, 2018
Vectorizing observations from multivariate distribution in Turing Probabilistic Programming	1	503	September 30, 2020

Turing.jl, differentiation & categorical outputs -- `isprobvec` conundrum

Related topics