Learning a mapping from an image to an ODE

,

I have trained a CNN to map an image to a sequence of feature vectors f_v and I want to learn a function du(u, f_v) that maps the feature vectors to an ODE. For each image I know how to compute both the function and first derivative that I am trying to model over the set of all images in my domain. My question is in most examples I see the input to the neural ODE is just u0. That’s fine for learning a single ODE, but from a practical standpoint during training how can I make my NeuralODE a function of both u and these features in Julia/Lux/OrdinaryDiffEQ? I don’t need to backprop through the features or optimize them as parameters, its a fixed representation per image.

More concretely say I want to use this Lux model for my NeuralODE:

model = @compact(
    Q=Dense(2 => 128),
    K=Dense(3 => 128),
    V=Dense(3 => 128),
    δU=Dense(128, 2),
    act=tanh
) do inps
    # u has shape (2,)
    # f has shape (3, 400)
    u, f_v = inps

    q = Q(u)
    k = K(f_v)
    v = V(f_v)

    attn = softmax(
        sum(q.*k, dims=1) ./ sqrt(128), 
        dims=2
    )

    # Apply the attention mask and reduce (128, 400) -> (128, 1)
    z = sum(attn.*v, dims=2)

    z = act.(z)

    du = δU(z)

    @return du
end

Is this something I can work with, or do I need to use some sort of trick to get the additional features as inputs? I have seen in other posts that additional parameters were passed with u0: Using Neural ODEs to learn a family of ODEs (with Automatic Differentiation) - #14 by leespen1, but its not clear to me how that would impact optimization. Ultimately only u[1] and u[2] are what I want out of the ODE. I guess I could discount the loss on 1200 additional dimensions of u, but is this the most idiomatic/computationally efficient way to handle these types of cases? I’m completely new to doing ML in Julia and pretty fresh when it comes to neural ODE. I went over a decent amount of examples, but I’m still feeling pretty lost here.

Just make the NN take in the vector [u; f_v] and make sure it’s sized correctly?

1 Like

Not sure how relevant it is for your usecase, but this came to mind: Particle-filter tutorial · LowLevelParticleFilters Documentation

1 Like

I guess it’s not totally clear to me where these extra inputs belong. When working with OrdinaryDiffEQ.jl I could include them in my parameters type when I define the ODEProblem then access p inside my derivative function. Looking at the example in the Lux documentation here MNIST Classification using Neural ODEs | Lux.jl Docs I see Training.train_single_step! providing an objective function and (x, y) as the data. What I don’t fully understand is the precise convention of (x, y) and how that is plugged into the model + loss function and solver.

Can my x just be [u; f_v]? If there are multiple inputs like this, is it just standard convention that the first one will be the one that gets integrated by the ODE solver?

The the overload to:

function (n::StatefulNeuralODE)(x, ps, st)
    st_model = StatefulLuxLayer(n.model, ps, st)
    dudt(u, p, t) = st_model([u; f_v] p)
    prob = ODEProblem{false}(ODEFunction{false}(dudt), x, n.tspan, ps)
    return solve(prob, n.solver; n.kwargs...), st_model.st
end

and you should be good if f_v is a cost function constant

2 Likes