It used to be working previously, I was able to run the training for 2000 iterations. But suddenly, from today, I am getting a StackOverflowError. I did not change anything in the code. I have tried restarting the PC, and also tried the execution in another PC. Still the same error. The error does not even have a stack trace.
Julia Version 1.7.3
Commit 742b9abb4d (2022-05-06 12:58 UTC)
Platform Info:
OS: Windows (x86_64-w64-mingw32)
CPU: Intel(R) Core(TM) i5-8365U CPU @ 1.60GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
Environment:
JULIA_EDITOR = code
JULIA_NUM_THREADS =
As the error has no further description, it is difficult to identify which variable/operation is causing the StackOverflow. Any advice on possible causes / troubleshotting methods?
Does the error stack trace tell you the line numbers or at least the function where this is failing? We would need to see the code that causes the error.
Here’s a reduced version of the code. X1 and X2 vectors are populated with values from the training data, with n_pts = 585 and n_cases = 1276.
X1 = zeros(13, n_pts * n_cases)
X2 = zeros(5, n_pts * n_cases)
i = 0
for ics in 1:n_cases, ipt in 1:n_pts
i += 1
X1[:, i] = vcat(xyz[ipt, ics, :], uparams[ics, :], wa[ics])
X2[:, i] = pp[ipt, ics, :]
end
loss_new(X1, X2, NN) = sum(abs2, NN(X1) - X2)
function loss4()
return loss_new(X1, X2, NN)
end
# Training
data = Iterators.repeated((), 1000)
opt = Flux.ADAM(0.01, (0.9, 0.99))
Flux.train!(loss4, Flux.params(NN), data, opt, cb=cb)
ERROR: StackOverflowError:
The stack trace does not even tell the line number where the error occurs. But, if I run the code line-by-line, the error occurs at the last line - during the Flux.train. No further details unfortunately.