Mini batching with ensemble problem diffeq and neural ode's sciml

Allan_Baker · December 15, 2022, 2:29am

Following this excellent example:
https://docs.sciml.ai/SciMLSensitivity/stable/ode_fitting/data_parallel/#Minibatching-Across-GPUs-with-DiffEqGPU

The title seems to tempt me with the syntax for micro-batching to make the ADAM algorithm perhaps perform a little better. But when I implement the example, I believe I always get the full ensemble each time instead of a randomly selected ensemble mini-batch set. 100 trajectories possible, and 100 trajectories used for each solve. Am I reading that wrong?

Can someone help me with the proper syntax for this? Say I wanted to train in batches of 10 or 20 from those 100. How would I construct the ensemble setup?

Anyway, kudus to this library. This is great stuff.

Best Regards,
Allan Baker

ChrisRackauckas · December 15, 2022, 2:45am

Oh that example just needs to pass batch_size. I need to find a nice way to test that…

But anyways the DiffEqGPU docs just went live today so you may want to check them out. This is the one with batch sizing and multiGPU

https://docs.sciml.ai/DiffEqGPU/stable/tutorials/multigpu/

I’ll give a warning that we haven’t migrated to GPU doc testing yet so this is code that has not been ran in awhile, but over the next week this will get enabled as part of our big doc clean up which will make those docs a bit more robust. For now, let me know if there’s an issue but I won’t handle it until I have a GPU setup next week.

Allan_Baker · December 15, 2022, 12:40pm

Thank you.
I’m actually not running on a GPU just yet as I still have some structures that aren’t iBits compatible that I haven’t worked through.

I’ll try the batch_size option.

Best Regards,
Allan Baker

Allan_Baker · December 15, 2022, 11:02pm

The link above is forbidden to me, both on home computer and work computer.

The batch_size goes in the Optimization.solve() function?

It didn’t seem to complain, but I’m not sure if it is working as I expected. I may need to put some @info in the ensemble prob_func to see what it is actually doing.

ChrisRackauckas · December 16, 2022, 12:30am

Ehh do the dev version: Setting Up Multi-GPU Parallel Parameter Sweeps · DiffEqGPU.jl

The docs will deploy soon.

No, it’s a thing for ensembles.

Allan_Baker · December 16, 2022, 2:56am

Ok. So I think I am thinking about things wrong. The batch_size is really just for multiple GPU stuff. I was thinking it would be something that could be used with the ADAM optimizer to reduce getting caught in a local minimum. I had thought that you could have smaller sets of trajectories be evaluated at each cost function by using a mini-batch. So if I had 300 trajectories total, I could take them randomly 50 at a time and help perhaps with defeating local minimums. Is that possible with Optimization.solve() and ensembles?

Is this a thing with the Optimization.solve() and ensembles diffeqs?

Allan_Baker · December 16, 2022, 3:17am

Do I just use DataLoader? I’m new at this, I haven’t used DataLoader before.

ChrisRackauckas · December 16, 2022, 8:08am

Batching optimization is a different and separate thing. That is covered by this tutorial: Data Iterators and Minibatching · Optimization.jl

Allan_Baker · December 16, 2022, 4:36pm

Is there some place that defines the calling structure options for solve. In this example it passed a loss with 4 arguments as opposed to 2. I wasn’t seeing that in the help.

Also, for the ncycle argument passing in the parameters? I’m not understanding how this is used as opposed to just passing train_loader in directly in that example. Is max_iterations redundant with numEpoch in that example?

Still trying to understand the flow as it relates to when the loss is calculated and consumed in ADAM and parameters are adjusted.

Allan_Baker · December 17, 2022, 5:22pm

Unfortunately, I’m having trouble figuring out how to get the Optimization.solve to correctly supply my ensemble problem with the ensemble parameters to monte carlo over through the Optimization.solve and the loss function interface (Optimization.OptimizationFunction). Prior to me trying to setup a Dataloader, all of the trajectories were supplied at once when the model was created and the solver did not have to control that. All trajectories were run and the cumulative loss was used to steer the ADAM algorithm. Now I want the Optimization.solve to control the chunks of trajectories N at a time. I don’t know how to pass that through to the loss function. I get “no method matching” errors because I think the loader is getting passed and not the N-trajectories contained in the batch. The example above is a time-history example and just different enough than my setup to still give me confusion. I’m sure it is something simple. Should I be passing the loader in on the parameters side? I’m not sure how the solver and optimizer algorithms poll the DataLoader to get the batches. So I’m having trouble getting the big picture.

Allan_Baker · December 20, 2022, 11:19pm

The problem appears to be this:

I don’t know how to call Optimization.solve so that my extra arguments to the loss function which are stored in trdata as a Vector of my input structure parameters are passed as a vector of length N instead of N additional arguments. I have tried (trdata) and (trdata,) and nothing seems to work.

res = Optimization.solve(optprob, opt, trdata ; callback = cb, maxiters=iterations)

When the syntax was:
res = Optimization.solve(optprob, opt; callback = cb, maxiters=iterations)
everything worked fine because trdata was sucked in from the parent scope of the loss function.

My current loss function, in case that needs correcting.

optf = Optimization.OptimizationFunction((x,p,td)->loss(x,td), adtype)
optprob = Optimization.OptimizationProblem(optf, θ)

Allan_Baker · December 20, 2022, 11:26pm

I may have it.
I think it is

train_loader = Flux.Data.DataLoader((trdata,),batchsize=batchSize)

The Tuple needs the trailing comma in the dataLoader. I had not tried that final combination in the DataLoader configuration, just the full pass-through.

Topic		Replies	Views
Struggling to train a UDE with DiffEqGPU General Usage diffeq , cuda , optimization , diffeqflux	9	816	March 7, 2023
Neural ODE minibatch error with multi-dimension input condition Machine Learning machine-learning , differentialequation	5	731	February 16, 2022
Train neural ODE with mini-batch in different initial conditions Machine Learning question , diffeq	5	1317	August 25, 2021
Why does EnsembleGPUArray not save at the given time points the solution? GPU	1	360	June 4, 2022
Minibatching with FFJORD Machine Learning flux , optimization , diffeqflux	1	396	December 28, 2022

Mini batching with ensemble problem diffeq and neural ode's sciml

Related topics