Negative binomial layer

I am new to flux and I would appreciate help with implementing custom layer (or maybe custom activation?).

Layer is like Dense layer, some inputs and outputs are exactly two neurons: first with NNlib.softplus activation, second with NNlib.sigmoid activation.
Motivation is to model parameters of negative binomial distribution directly (n, p).
Loss function for this is defined as described here:

Is there any chance someone experinced enough can help me with this?
a] how can I say that each neuron in output layer has different activation
b] construct loss function based on likelyhood estimate as in the artical

Target is discrete variable with negative binomial distribution.
Thanx for anyone for helping me out

1 Like

Not sure whether this helps you, but here is an example of distributional regression where the network output are coefficients of Bernstein basis polynomials and the optimization is with respect to a composite quantile loss. That is, the probability distribution is specified by its quantile function which is assumed to be a Bernstein polynomial of any degree/flexibility.

using Flux, Statistics

#  generate random training data
n = 1_000_000
m = 50
y = rand(Float32, n)    
x = rand(Float32, m, n) 
trdata = Flux.Data.DataLoader(x, y, batchsize = 128) 

#  quantile levels for training 
prob = Float32.(1:10) ./ 11  

#  Bernstein design matrix
degree = 8
dbin(x, n, p) = binomial(n, x) * p^x * (1-p)^(n-x)
B = [dbin(d, degree, p) for p in prob, d in 0:degree] 

#  quantile loss functions
function qtloss(qt::AbstractMatrix, y::AbstractVector, prob::AbstractVector)
    err = y .- qt'
    mean( (prob' .- (err .< 0)) .* err )

#  create simple dense neural network model for the Bernstein coefficients
model = Chain( Dense(m, 64, relu),
               Dense(64, 32, relu),
               Dense(32, degree+1, relu) ) 

#  train model
loss(x, y) = qtloss(B * model(x), y, prob)
@time Flux.train!(loss, Flux.params(model), trdata, ADAM())

My application is probabilistic weather forecasting and the approach is described in more detail in this article.

1 Like

Here is a first attempt, but please check the likelihood function and note that there likely are better implementations

using Flux, Statistics
x = rand(Float32, 10, 10_000)
y = rand(1:10, 10_000)
trdata = Flux.Data.DataLoader(x, y, batchsize = 100)
function nbloss(output, y)
    n   = NNlib.softplus.(output[1, :])
    p   = NNlib.sigmoid.(output[2, :])
    nll = @. log(factorial(n - 1)) + log(factorial(y + 1)) -
        log(factorial(n + y)) - n * log(p) - y * log(1 - p)
    return mean(nll)
model = Chain(Dense(10,10), Dense(10,2))
loss(x, y) = nbloss(model(x), y)
@time Flux.train!(loss, Flux.params(model), trdata, ADAM())
1 Like

Hi johnbb, this is fantastic.

Idea using regular Dense layer with identity and hide important transformation inside loss is very elegant. I think also more stable for training (vanishing gradient with sigmoid).
I think same is done in Flux.Losses.logitbinarycrossentropy

I will try to derive loss as well. If the implementation in artical/tensorflow is correct, then here we probably should have:
nll = @. log(factorial(n - 1)) + log(factorial(y)) - log(factorial(n + y - 1)) - n * log( p) - y * log(1 - p)

But I really really appreciate your time and now clear direction, how to work with this in Flux.
Thank you

@lubo Good that you found the example useful. As it was written in just a few minutes, I was afraid of errors. Maybe the log(factorial(...) is not robust either and should be replaced by a special function, possibly available in SpecialFunctions.jl. It could be that the real Flux experts also have some recommendations.

@johnbb thanks for direction on SpecialFunctions.jl, than it can be written with loggamma fce as:

nll = @. loggamma(n) + loggamma(y + 1) - loggamma(n + y) - n * log( p) - y * log(1 - p)

what is also faster solution.