StackOverflowError when training a neural network

jpt · July 14, 2022, 3:14pm

I am training a Neural network using the Flux library. The definition of the neural network is as follows:

NN = Flux.Chain(
        Flux.Dense(13, 32, sigmoid),
        Flux.Dense(32, 32, sigmoid),
        Flux.Dense(32, 5),
        y->abs.(y))

It used to be working previously, I was able to run the training for 2000 iterations. But suddenly, from today, I am getting a StackOverflowError. I did not change anything in the code. I have tried restarting the PC, and also tried the execution in another PC. Still the same error. The error does not even have a stack trace.

ERROR: StackOverflowError:

The package status is as follows:

Project.toml`
  [fbb218c0] BSON v0.3.5
  [4ec6fef6] Bezier v0.1.7
  [41bf760c] DiffEqSensitivity v6.79.0   
  [0c46a032] DifferentialEquations v7.1.0
  [6a86dc24] FiniteDiff v2.13.0
  [587475ba] Flux v0.13.3
  [f6369f11] ForwardDiff v0.10.30        
  [91a5bcdd] Plots v1.30.2
  [e88e6eb3] Zygote v0.6.40

Version info:

Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i5-8365U CPU @ 1.60GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
  JULIA_EDITOR = code
  JULIA_NUM_THREADS =

As the error has no further description, it is difficult to identify which variable/operation is causing the StackOverflow. Any advice on possible causes / troubleshotting methods?

mcabbott · July 14, 2022, 3:27pm

Can you post the stack trace from the error? And perhaps versions from using Pkg; Pkg.status() and from versioninfo()?

jpt · July 14, 2022, 3:33pm

I have updated in the original post

jmair · July 15, 2022, 7:15am

Does the error stack trace tell you the line numbers or at least the function where this is failing? We would need to see the code that causes the error.

jpt · July 15, 2022, 8:08am

Here’s a reduced version of the code. X1 and X2 vectors are populated with values from the training data, with n_pts = 585 and n_cases = 1276.

X1 = zeros(13, n_pts * n_cases)
X2 = zeros(5, n_pts * n_cases)
i = 0
for ics in 1:n_cases, ipt in 1:n_pts
    i += 1
    X1[:, i] = vcat(xyz[ipt, ics, :], uparams[ics, :], wa[ics])
    X2[:, i] = pp[ipt, ics, :]
end

loss_new(X1, X2, NN) = sum(abs2, NN(X1) - X2)
function loss4()
    return loss_new(X1, X2, NN)
end

# Training
data = Iterators.repeated((), 1000)
opt = Flux.ADAM(0.01, (0.9, 0.99))
Flux.train!(loss4, Flux.params(NN), data, opt, cb=cb)

ERROR: StackOverflowError:

The stack trace does not even tell the line number where the error occurs. But, if I run the code line-by-line, the error occurs at the last line - during the Flux.train. No further details unfortunately.

jmair · July 16, 2022, 7:10am

I think the issue is likely to be the loss function, try using one of the inbuilt flux loss functions:
Flux Losses

You can try and run your current loss normally in the REPL to see if it works as expected, but I imagine it should look like:

sum(abs2.(NN(X1) .- X2))

I would suggest changing the train line to something like:

data=Iterators.repeated((X1,X2), 1000)
loss(x, y)=Flux.Losses.mse(NN(x),y)
Flux.train!(loss, params(NN), data, opt, cb=cb)

I am not able to run the code now, so you may need to tweak the code above to get it to work, but see if that helps.

Topic		Replies	Views
VS Code + stackoverflowerror General Usage	3	920	December 7, 2021
StackOverflowError Optimization (Mathematical) jump	7	3348	December 17, 2017
Jupyter lab StackOverflowError when computing inverse matrix Jupyter-Notebook stackoverflow	3	1015	December 7, 2021
StackOverflowError() with ModellingToolkit.jl and NaNMath.jl Modelling & Simulations nan , modelingtoolkit , modelling	6	301	September 28, 2023
StackOverflowError with CSV.File when using HybridArrays and Symbolics in 1.7 General Usage	6	518	December 7, 2021

StackOverflowError when training a neural network

Related topics