I guess I found the issue: the map()
call within the loss function
function loss_LSTM(p)
return sum(abs2, y_target - vcat(map(x -> model(reshape([(i(x)*n(x)).^2/in_sqr_max; (i(x)).^2/i_sqr_max], 2, 1, 1), p, st)[1], tsteps_NODE)...))
end;
seemed to conflict with the AD calculation. Providing the input data as a simple pre-calculated array solved the issue.