I have trained a CNN to map an image to a sequence of feature vectors f_v and I want to learn a function du(u, f_v) that maps the feature vectors to an ODE. For each image I know how to compute both the function and first derivative that I am trying to model over the set of all images in my domain. My question is in most examples I see the input to the neural ODE is just u0. That’s fine for learning a single ODE, but from a practical standpoint during training how can I make my NeuralODE a function of both u and these features in Julia/Lux/OrdinaryDiffEQ? I don’t need to backprop through the features or optimize them as parameters, its a fixed representation per image.
More concretely say I want to use this Lux model for my NeuralODE:
model = @compact(
Q=Dense(2 => 128),
K=Dense(3 => 128),
V=Dense(3 => 128),
δU=Dense(128, 2),
act=tanh
) do inps
# u has shape (2,)
# f has shape (3, 400)
u, f_v = inps
q = Q(u)
k = K(f_v)
v = V(f_v)
attn = softmax(
sum(q.*k, dims=1) ./ sqrt(128),
dims=2
)
# Apply the attention mask and reduce (128, 400) -> (128, 1)
z = sum(attn.*v, dims=2)
z = act.(z)
du = δU(z)
@return du
end
Is this something I can work with, or do I need to use some sort of trick to get the additional features as inputs? I have seen in other posts that additional parameters were passed with u0: Using Neural ODEs to learn a family of ODEs (with Automatic Differentiation) - #14 by leespen1, but its not clear to me how that would impact optimization. Ultimately only u[1] and u[2] are what I want out of the ODE. I guess I could discount the loss on 1200 additional dimensions of u, but is this the most idiomatic/computationally efficient way to handle these types of cases? I’m completely new to doing ML in Julia and pretty fresh when it comes to neural ODE. I went over a decent amount of examples, but I’m still feeling pretty lost here.