Approximating A+B*log with NeuralNetwork

Dear All,

I am working on an economics project which utilizes neural networks as a method for approximation of model solutions (those models are systems of differential/difference equations). For some versions of these models, I have analytical solutions that I can use as a trial. For some problems, it works really great. However, I encountered a few really weird cases, where neural network isn’t able to approximate (or more precisely, Flux wasn’t able to train it) particularly simple functions. The most striking case was following.

V(k) = A + B*log(k)

Where A = -27.02875f0, B = 0.64935064f0 and k is discretized by vector of 2500 points from 0.05 to 0.31. I tried to approximated this simple function using neural network. I used 4 layer network, composed of Bent identity functions, 32 units per layer, besides that, I also tried softplus functions combined with final identity and a large network of Relu units trained on GPU (few hundreds of Relus per layer). Neither of those attempts succeeded, regardless of which variant of gradient descent I used (stochastic vs full batch, ADAM, Nestorov,…). Instead of converging towards the solution, network formed simple line.

It looks like convergence towards local minima of the loss function. Is there some clever way, how to tackle this type of problem (I tried minibatching without much success)? I managed to “solve it” by using network with 12 hidden layers, but that sounds to me like an overkill, also the convergence was really slow and fragile. As a loss function, I used simple mean squared error (It worked well for other functional equations that I solved using neural network).

function ℒ(x)
    𝕷 = sum((𝒱.(x) - φ(x)).^2)
    return 𝕷

Where 𝒱 is the function to be approximated and φ is the neural network.

Full code

#(1) Install and initialize packages
using Pkg
using Plots
using Parameters
using LinearAlgebra
using CUDA
using Flux
using Random
using Distributions
using ForwardDiff

ϰ = 2500
A = -27.02875f0
B = 0.64935064f0
kl = 0.05
ku = 0.31

kGrid = reshape(rand(Uniform(kl,ku),ϰ,1),1,ϰ)
kkGrid = collect(range(kl,ku,length=ϰ))

𝒱(k) = A + B*log(k)

bent(x) = (sqrt(x^2+1)-1)/2 + x

φ = Flux.Chain(Dense(1,32,bent),Dense(32,32,bent),

θ = Flux.params(φ)

function ℒ(x)
    𝕷 = sum((𝒱.(x) - φ(x)).^2)
    return 𝕷

Data = [kGrid]
opt = AMSGrad(0.001)

cb = () -> println(ℒ(kGrid))
@time Flux.@epochs 5000 Flux.train!(ℒ,θ,Data,opt,cb=cb)

vGrid = 𝒱.(kkGrid)
wGrid = φ(kkGrid')'

Plots.plot!(kkGrid,wGrid,label=["neural" "true"])

Any guidance with this type of problem? Is there some stupid mistake in my code that causes the problem, or it it something deeper? It is hard for me to believe, that neural networks can’t approximate A+ B*log(x) without much effort.