If NN is N → 1 sized, then you want to do reverse mode for the x values per x in the loss function, but the n train doing reverse mode once for the theta. That’s then a reason for reverse-over-reverse. Then for the second derivative, N->1 becomes N->N after the first, in that case you then want to do forward-over-reverse-over-reverse. This is exactly the PINN reasons BTW.
2 Likes