If you post a complete MWE I might be able to take a look. As a side note, any reason to use ForwardDiff here? In a neural network setting you usually have many parameters and only one output, so reverse mode autodiff seems more appropriate.
If you post a complete MWE I might be able to take a look. As a side note, any reason to use ForwardDiff here? In a neural network setting you usually have many parameters and only one output, so reverse mode autodiff seems more appropriate.