Negative binomial layer

lubo · March 9, 2021, 1:45pm

Hello,
I am new to flux and I would appreciate help with implementing custom layer (or maybe custom activation?).

Layer is like Dense layer, some inputs and outputs are exactly two neurons: first with NNlib.softplus activation, second with NNlib.sigmoid activation.
Motivation is to model parameters of negative binomial distribution directly (n, p).
Loss function for this is defined as described here:

Is there any chance someone experinced enough can help me with this?
a] how can I say that each neuron in output layer has different activation
b] construct loss function based on likelyhood estimate as in the artical

Target is discrete variable with negative binomial distribution.
Thanx for anyone for helping me out
Lubo

johnbb · March 9, 2021, 4:36pm

Not sure whether this helps you, but here is an example of distributional regression where the network output are coefficients of Bernstein basis polynomials and the optimization is with respect to a composite quantile loss. That is, the probability distribution is specified by its quantile function which is assumed to be a Bernstein polynomial of any degree/flexibility.

using Flux, Statistics

#  generate random training data
n = 1_000_000
m = 50
y = rand(Float32, n)    
x = rand(Float32, m, n) 
trdata = Flux.Data.DataLoader(x, y, batchsize = 128) 

#  quantile levels for training 
prob = Float32.(1:10) ./ 11  

#  Bernstein design matrix
degree = 8
dbin(x, n, p) = binomial(n, x) * p^x * (1-p)^(n-x)
B = [dbin(d, degree, p) for p in prob, d in 0:degree] 

#  quantile loss functions
function qtloss(qt::AbstractMatrix, y::AbstractVector, prob::AbstractVector)
    err = y .- qt'
    mean( (prob' .- (err .< 0)) .* err )
end

#  create simple dense neural network model for the Bernstein coefficients
model = Chain( Dense(m, 64, relu),
               Dense(64, 32, relu),
               Dense(32, degree+1, relu) ) 

#  train model
loss(x, y) = qtloss(B * model(x), y, prob)
@time Flux.train!(loss, Flux.params(model), trdata, ADAM())

My application is probabilistic weather forecasting and the approach is described in more detail in this article.

johnbb · March 9, 2021, 7:58pm

Here is a first attempt, but please check the likelihood function and note that there likely are better implementations

using Flux, Statistics
x = rand(Float32, 10, 10_000)
y = rand(1:10, 10_000)
trdata = Flux.Data.DataLoader(x, y, batchsize = 100)
function nbloss(output, y)
    n   = NNlib.softplus.(output[1, :])
    p   = NNlib.sigmoid.(output[2, :])
    nll = @. log(factorial(n - 1)) + log(factorial(y + 1)) -
        log(factorial(n + y)) - n * log(p) - y * log(1 - p)
    return mean(nll)
end
model = Chain(Dense(10,10), Dense(10,2))
loss(x, y) = nbloss(model(x), y)
@time Flux.train!(loss, Flux.params(model), trdata, ADAM())

lubo · March 10, 2021, 6:35am

Hi johnbb, this is fantastic.

Idea using regular Dense layer with identity and hide important transformation inside loss is very elegant. I think also more stable for training (vanishing gradient with sigmoid).
I think same is done in Flux.Losses.logitbinarycrossentropy

I will try to derive loss as well. If the implementation in artical/tensorflow is correct, then here we probably should have:
nll = @. log(factorial(n - 1)) + log(factorial(y)) - log(factorial(n + y - 1)) - n * log( p) - y * log(1 - p)

But I really really appreciate your time and now clear direction, how to work with this in Flux.
Thank you
Lubo

johnbb · March 10, 2021, 10:24am

@lubo Good that you found the example useful. As it was written in just a few minutes, I was afraid of errors. Maybe the log(factorial(...) is not robust either and should be replaced by a special function, possibly available in SpecialFunctions.jl. It could be that the real Flux experts also have some recommendations.

lubo · March 10, 2021, 10:45am

@johnbb thanks for direction on SpecialFunctions.jl, than it can be written with loggamma fce as:

nll = @. loggamma(n) + loggamma(y + 1) - loggamma(n + y) - n * log( p) - y * log(1 - p)

what is also faster solution.
Lubo

Topic		Replies	Views
Bayesian neural networks in Flux General Usage flux	2	870	May 8, 2020
FlexLayer: A Custom Layer with Different Activation Fcns, Non-negativity, and more New to Julia flux	0	561	July 3, 2020
Amortized Hierarchical Variational Model Probabilistic programming flux , turing , machine-learning	4	460	September 17, 2020
Problem understanding Loss function behavior using Flux.jl Machine Learning flux	5	1314	August 7, 2020
Have no idea how to train neural networks using sampled data from distributions New to Julia question , flux , distributions	2	425	November 26, 2020

Negative binomial layer

Related topics