Why do you need two separate networks? Why couldn’t a single F^{NN}([\vec{u}]) with two outputs do the trick? Then there’d be no issues with trying to pass initial parameters for multiple networks to NNODE which seems to be built for training a single chain, rather than multiple (see types in ODE-Specialized Physics-Informed Neural Network (PINN) Solver · NeuralPDE.jl)?
EDIT: I think the issue might be that based on the architecture of chain1 NNODE expects initial parameters of certain dimensionality whereas it gets something different.