Mini batching with ensemble problem diffeq and neural ode's sciml

Following this excellent example:
https://docs.sciml.ai/SciMLSensitivity/stable/ode_fitting/data_parallel/#Minibatching-Across-GPUs-with-DiffEqGPU

The title seems to tempt me with the syntax for micro-batching to make the ADAM algorithm perhaps perform a little better. But when I implement the example, I believe I always get the full ensemble each time instead of a randomly selected ensemble mini-batch set. 100 trajectories possible, and 100 trajectories used for each solve. Am I reading that wrong?

Can someone help me with the proper syntax for this? Say I wanted to train in batches of 10 or 20 from those 100. How would I construct the ensemble setup?

Anyway, kudus to this library. This is great stuff.

Best Regards,
Allan Baker

Oh that example just needs to pass batch_size. I need to find a nice way to test that…

But anyways the DiffEqGPU docs just went live today so you may want to check them out. This is the one with batch sizing and multiGPU

https://docs.sciml.ai/DiffEqGPU/stable/tutorials/multigpu/

I’ll give a warning that we haven’t migrated to GPU doc testing yet so this is code that has not been ran in awhile, but over the next week this will get enabled as part of our big doc clean up which will make those docs a bit more robust. For now, let me know if there’s an issue but I won’t handle it until I have a GPU setup next week.

Thank you.
I’m actually not running on a GPU just yet as I still have some structures that aren’t iBits compatible that I haven’t worked through.

I’ll try the batch_size option.

Best Regards,
Allan Baker

The link above is forbidden to me, both on home computer and work computer.

The batch_size goes in the Optimization.solve() function?

It didn’t seem to complain, but I’m not sure if it is working as I expected. I may need to put some @info in the ensemble prob_func to see what it is actually doing.

Ehh do the dev version: Setting Up Multi-GPU Parallel Parameter Sweeps · DiffEqGPU.jl

The docs will deploy soon.

No, it’s a thing for ensembles.

Ok. So I think I am thinking about things wrong. The batch_size is really just for multiple GPU stuff. I was thinking it would be something that could be used with the ADAM optimizer to reduce getting caught in a local minimum. I had thought that you could have smaller sets of trajectories be evaluated at each cost function by using a mini-batch. So if I had 300 trajectories total, I could take them randomly 50 at a time and help perhaps with defeating local minimums. Is that possible with Optimization.solve() and ensembles?

Is this a thing with the Optimization.solve() and ensembles diffeqs?

Do I just use DataLoader? I’m new at this, I haven’t used DataLoader before.

Batching optimization is a different and separate thing. That is covered by this tutorial: Data Iterators and Minibatching · Optimization.jl

Is there some place that defines the calling structure options for solve. In this example it passed a loss with 4 arguments as opposed to 2. I wasn’t seeing that in the help.

Also, for the ncycle argument passing in the parameters? I’m not understanding how this is used as opposed to just passing train_loader in directly in that example. Is max_iterations redundant with numEpoch in that example?

Still trying to understand the flow as it relates to when the loss is calculated and consumed in ADAM and parameters are adjusted.

Unfortunately, I’m having trouble figuring out how to get the Optimization.solve to correctly supply my ensemble problem with the ensemble parameters to monte carlo over through the Optimization.solve and the loss function interface (Optimization.OptimizationFunction). Prior to me trying to setup a Dataloader, all of the trajectories were supplied at once when the model was created and the solver did not have to control that. All trajectories were run and the cumulative loss was used to steer the ADAM algorithm. Now I want the Optimization.solve to control the chunks of trajectories N at a time. I don’t know how to pass that through to the loss function. I get “no method matching” errors because I think the loader is getting passed and not the N-trajectories contained in the batch. The example above is a time-history example and just different enough than my setup to still give me confusion. I’m sure it is something simple. Should I be passing the loader in on the parameters side? I’m not sure how the solver and optimizer algorithms poll the DataLoader to get the batches. So I’m having trouble getting the big picture.

The problem appears to be this:

I don’t know how to call Optimization.solve so that my extra arguments to the loss function which are stored in trdata as a Vector of my input structure parameters are passed as a vector of length N instead of N additional arguments. I have tried (trdata) and (trdata,) and nothing seems to work.

res = Optimization.solve(optprob, opt, trdata ; callback = cb, maxiters=iterations)

When the syntax was:
res = Optimization.solve(optprob, opt; callback = cb, maxiters=iterations)
everything worked fine because trdata was sucked in from the parent scope of the loss function.

My current loss function, in case that needs correcting.

optf = Optimization.OptimizationFunction((x,p,td)->loss(x,td), adtype)
optprob = Optimization.OptimizationProblem(optf, θ)  

I may have it.
I think it is

train_loader = Flux.Data.DataLoader((trdata,),batchsize=batchSize)    

The Tuple needs the trailing comma in the dataLoader. I had not tried that final combination in the DataLoader configuration, just the full pass-through.