Training a Neural ODE system using the output of a Method-Of-Lines computation

Hello everyone! I’m currently looking to develop my own customised Julia code for training Neural ODEs using a Physics-Informed approach. I want to create an interactive PDE-based model; however, there are two constraints, namely:

  1. Using a PDE system to train a Physics-Informed DeepONet is heavy on computational resources. Due to financial constraints, I’m using only my personal laptop which has CPU but not GPUs, at least not in any significant sense - I do plan to get my own GPU-based machine in the next few months, but I need to save money right now.
  2. Neural ODEs are powerful SciML tools for modelling dynamic systems. However, owing to the fact that they are Initial Value Problems, they do not explicitly incorporate spatial heterogeneity - this is a major drawback when modelling spatiotemporal systems.

To that end, I thought of a strategy: first, I solve my PDE system using the Method-Of-Lines. Post this, the result that I get from the above computation is what I intend to serve as a dataset to train my Neural ODEs - the reason being that Neural ODEs allow interactive exploration of the model. Due to the way the Method-Of-Lines is structured, suppose there a m data points in x, n in y, and p in z, I would end up with (mnp) spatial coordinate points. Also, if there are t data points in time, then that means there should be t data points in t, and t data points in each of my state variables, and all these for each of the (mnp) spatial coordinate points.

Accordingly, I decided to try this approach with some tutorial codes. I decided to use this tutorial code for the Brusselator PDE system as given in the MethodOfLines.jl documentation.

Next, my idea was to modify the code given in the DiffEqFlux.jl documentation, as regards the example where weather data is fed into the model, which learns the Neural ODEs governing the same.

I used the code in the latter link after modifying it in a Jupyter notebook, the code for which is as follows:

using ModelingToolkit, MethodOfLines, OrdinaryDiffEq, DomainSets

@parameters x y t
@variables u(..) v(..)
Dt = Differential(t)
Dx = Differential(x)
Dy = Differential(y)
Dxx = Differential(x)^2
Dyy = Differential(y)^2

∇²(u) = Dxx(u) + Dyy(u)

brusselator_f(x, y, t) = (((x-0.3)^2 + (y-0.6)^2) <= 0.1^2) * (t >= 1.1) * 5.

x_min = y_min = t_min = 0.0
x_max = y_max = 1.0
t_max = 11.5

α = 10.

u0(x,y,t) = 22(y*(1-y))^(3/2)
v0(x,y,t) = 27(x*(1-x))^(3/2)

eq = [Dt(u(x,y,t)) ~ 1. + v(x,y,t)*u(x,y,t)^2 - 4.4*u(x,y,t) + α*∇²(u(x,y,t)) + brusselator_f(x, y, t),
       Dt(v(x,y,t)) ~ 3.4*u(x,y,t) - v(x,y,t)*u(x,y,t)^2 + α*∇²(v(x,y,t))]

domains = [x ∈ Interval(x_min, x_max),
              y ∈ Interval(y_min, y_max),
              t ∈ Interval(t_min, t_max)]

# Periodic BCs
bcs = [u(x,y,0) ~ u0(x,y,0),
       u(0,y,t) ~ u(1,y,t),
       u(x,0,t) ~ u(x,1,t),

       v(x,y,0) ~ v0(x,y,0),
       v(0,y,t) ~ v(1,y,t),
       v(x,0,t) ~ v(x,1,t)] 

@named pdesys = PDESystem(eq,bcs,domains,[x,y,t],[u(x,y,t),v(x,y,t)])

N = 32

order = 2 # This may be increased to improve accuracy of some schemes

# Integers for x and y are interpreted as number of points. Use a Float to directtly specify stepsizes dx and dy.
discretization = MOLFiniteDifference([x=>N, y=>N], t, approx_order=order)

# Convert the PDE problem into an ODE problem
println("Discretization:")
@time prob = discretize(pdesys,discretization)

println("Solve:")
@time sol = solve(prob, TRBDF2(), saveat=0.1)

discrete_x = sol[x]
discrete_y = sol[y]
discrete_t = sol[t]

solu = sol[u(x, y, t)]
solv = sol[v(x, y, t)]

# Check for missing values in u and v
println("Number of missing values in u: ", sum(ismissing.(sol[u(x,y,t)])))
println("Number of missing values in v: ", sum(ismissing.(sol[v(x,y,t)])))
println("Number of missing values in t: ", sum(ismissing.(sol[t])))
println("Number of missing values in x: ", sum(ismissing.(sol[x])))
println("Number of missing values in y: ", sum(ismissing.(sol[y])))

# Check for NaN values (common if solver failed)
println("Number of NaN in u: ", sum(isnan.(sol[u(x,y,t)])))
println("Number of NaN in v: ", sum(isnan.(sol[v(x,y,t)])))
println("Number of NaN in t: ", sum(isnan.(sol[t])))
println("Number of NaN in x: ", sum(isnan.(sol[x])))
println("Number of NaN in y: ", sum(isnan.(sol[y])))

using Random, Dates, Optimization, ComponentArrays, Lux, OptimizationOptimisers, DiffEqFlux, OrdinaryDiffEq, CSV, DataFrames, Dates, Statistics

function train_custom_neural_ode(data, i, j)
    
    a = data
#    discrete_x = a[x]
#    discrete_y = a[y]
#    discrete_t = a[t]
    
    solu = a[u(x, y, t)][i, j, :]  # (Nt,)
    solv = a[v(x, y, t)][i, j, :]  # (Nt,)
    discrete_t = a[t]              # (Nt,)
    
#    raw_df = DataFrame(time = discrete_t, u = solu, v = solv)
    FEATURES = [:solu, :solv]
    raw_df = [:discrete_t]
    
    function standardize(x)
        μ = mean(x; dims = 2)
        σ = std(x; dims = 2)
        z = (x .- μ) ./ σ
        return z, μ, σ
    end   

    function featurize(raw_df, FEATURES; num_train = 100)
        t = raw_df[:]
        y = Matrix(select(raw_df, FEATURES))'
        n_total = length(t)
        n_train = min(num_train, n_total)
        if n_total < num_train
            @warn "Only $n_total time points available, less than requested $num_train. Using $n_total for training."
        end
        t_train = t[1:n_train]
        y_train = y[:, 1:n_train]
        t_test = t[(n_train+1):end]
        y_test = y[:, (n_train+1):end]
        # Standardize as before...
        t_train, t_mean, t_scale = standardize(t_train)
        y_train, y_mean, y_scale = standardize(y_train)
        t_test = (t_test .- t_mean) ./ t_scale
        y_test = (y_test .- y_mean) ./ y_scale
        return (vec(t_train), y_train, vec(t_test), y_test, (t_mean, t_scale), (y_mean, y_scale))
    end


"""    function featurize(raw_df, FEATURES; num_train = 100)
        # Extract time and features directly
        t = raw_df
        y = Matrix(select(raw_df, FEATURES))'
    
        # Split into train and test
        t_train = t[1:num_train]
        y_train = y[:, 1:num_train]
        t_test = t[(num_train+1):end]
        y_test = y[:, (num_train+1):end]
    
        # Standardize
        t_train, t_mean, t_scale = standardize(t_train)
        y_train, y_mean, y_scale = standardize(y_train)
        t_test = (t_test .- t_mean) ./ t_scale
        y_test = (y_test .- y_mean) ./ y_scale
    
        return (
            vec(t_train), y_train, vec(t_test), y_test, (t_mean, t_scale), (y_mean, y_scale)
        )
    end """

    
    t_train, y_train, t_test, y_test, (t_mean, t_scale), (y_mean, y_scale) = featurize(raw_df, FEATURES; num_train=100)
#    t_train, y_train, t_test, y_test, (t_mean, t_scale), (y_mean, y_scale) = featurize(raw_df, FEATURES; num_train=16)

    function train_one_round(node, p, state, y, opt, maxiters, rng, y0 = y[:, 1]; kwargs...)
        predict(p) = Array(node(y0, p, state)[1])
        loss(p) = sum(abs2, predict(p) .- y)
    
        adtype = Optimization.AutoZygote()
        optf = OptimizationFunction((p, _) -> loss(p), adtype)
        optprob = OptimizationProblem(optf, p)
        res = solve(optprob, opt; maxiters = maxiters, kwargs...)
        res.minimizer, state
    end
    
    function train(t, y, obs_grid, maxiters, lr, rng, p = nothing, state = nothing; kwargs...)
        log_results(ps, losses) = (state, loss) -> begin
            push!(ps, copy(state.u))
            push!(losses, loss)
            false
        end
    
        ps, losses = ComponentArray[], Float32[]
        for k in obs_grid
            node, p_new, state_new = neural_ode(t, size(y, 1))
            p === nothing && (p = p_new)
            state === nothing && (state = state_new)
    
            p, state = train_one_round(node, p, state, y, OptimizationOptimisers.AdamW(lr),
                maxiters, rng; callback = log_results(ps, losses), kwargs...)
        end
        ps, state, losses
    end

    rng = MersenneTwister(123)
    obs_grid = 4:4:length(t_train) # we train on an increasing amount of the first k obs
    maxiters = 150
    lr = 5e-3
    ps, state, losses = train(t_train, y_train, obs_grid, maxiters, lr, rng; progress = true)

    return ps, state, losses
end

struct NODETrainResult
    ix::Int
    iy::Int
    x::Float64
    y::Float64
    ps
    state
    losses
end

Up until this point, the code ran successfully, both in terms of meaningful output as well as code syntax. However, when I entered the next block of code:

results = NODETrainResult[]
x_grid = sol[x]
y_grid = sol[y]

for ix in 1:length(x_grid)
    for iy in 1:length(y_grid)
        try
            ps, state, losses = train_custom_neural_ode(sol, ix, iy)
            push!(results, NODETrainResult(ix, iy, x_grid[ix], y_grid[iy], ps, state, losses))
        catch e
            @warn "NODE training failed at (ix=$ix, iy=$iy): $e"
        end
    end
end

I got:

┌ Warning: NODE training failed at (ix=1, iy=1): MethodError(DataFrames.select, ([:discrete_t], [:solu, :solv]), 0x0000000000007d38)
└ @ Main In[12]:11

The above warning was repeated for each and every member of the 1024 spatial points.

Could anyone help me figure out what’s wrong with the code? I mean, it’s running, but it’s not giving a proper output. I’d really appreciate any help in this regard, with suggestions for editing the code.

I’m not sure MethodOfLines.jl is ready yet for NNs. Currently it has an issue open here: Can I use this approach with PDE in 2D? · Issue #46 · SciML/ModelingToolkitNeuralNets.jl · GitHub to track for further developments.

Dear Dr. Rackauckas,

I checked out the the link you mentioned; yes, it does seem somewhat similar to the problem I’m facing. Thank you very much for replying to my question. I’ll try some other methods to deal with this; as and when I get success, I will share the codes corresponding to the same here.