NaN errors in Flux

Hi all,

I am trying to replicate a Likelihood Approximation Network (LAN) with Flux.jl. LANs are used to learn the likelihood function of intractable computational models. As a proof of concept, I am trying to apply the method to two simple models for which the likelihood function is known: a Gaussian model and a decision model called the Linear Ballistic Accumulator (LBA). I was successful in developing a LAN for the Gaussian model, but the LAN for the LBA produces NaNs as predictions. I tried various solutions from other threads, such as decreasing the learning rate and using BatchNorm, but those recommendations did not solve the problem. Changing the activation function to relu, solved the NaN problem, but interfered with the ability of the NN to learn the likelihood function.

Can I do anything to fix this problem? Please let me know if there are more details I can provide.

Just to reiterate an old point of discussion: signaling NaNs would help you to discover the root of the problem…

Indeed, it would be helpful to know where the NaN originated. As far as I can tell, I replicated the procedure described in the paper. This makes me wonder whether there is a problem with the AD.

I was wondering whether someone might be able to tell me if I misspecified the NN model or if NaNs are likely due to a bug?

I tracked down the problem to a few large outliers in the training data, which caused a NaN when passed to tanh. So far, removing the outliers seems to have solved the problem.