Large-Scale HPC Project on Probabilistic Programming at Scale in Conjunction with Scientific Simulators

Hi everyone,

we recently got approval for our 250m CPU-hour GCS supercomputing project named “Bayesian Inference of the Reactive Shock-Bubble Interaction - Probabilistic Programming at Scale”.

A rough sketch:
The Bayesian inference is performed using an adapted version of @Marco_Cusumano-Towne 's GEN with custom inference algorithms, which are coupled with JAX for amortization and are then routed through a probabilistic programming execution protocol to control the reacting-flow simulations.

For this we are using HLRS’s new HAWK supercomputer made up of ~10k AMD EPYC 7742s with multiple GPU cluster working concurrently to run the machine learning stack and support the probabilistic programs.

Together with HLRS we hope to pave the way for large-scale HPC deployments of Julia in the German supercomputing community, which will also be accessible to all other European researchers through PRACE.

Oh, and really push the limits of probabilistic programming and distributed computing in Julia of course :))

27 Likes

Excellent news! Where can we read more about the project?

1 Like

This is fantastic news!

If you want to stay completely within Julia, https://github.com/FluxML/XLA.jl and GitHub - MikeInnes/Mjolnir.jl: A little less conversation, a little more abstraction can provide more general/ powerful capabilities to JAX, (albeit more inchoate).

@MikeInnes can say more.

2 Likes

Hi Mohamed,

there are currently still parts of the algorithms, which are yet to be published. But that will hopefully be completed within the next 1-2 months. There exists a long project proposal, but the “interesting” algorithms section only takes up 2 pages of that.

I’ll write up a very long post once the algorithms are on arXiv, explaining our ideas behind the project and the our rationale behind the current structure.

We will also open-source all of our code along the way, most notably probably:

  • The interface between Gen and JAX, including integration with Deepmind’s Haiku and RLax - we currently call it GenJAX, but the final name is still up for debate.
  • A number of custom inference algorithms for Gen
  • Probabilistic programming execution protocol to interact with scientific simulators entirely written in Julia
4 Likes

I really like the Flux stack and was initially hoping to use it but when it came to the approval process we thought it a little more prudent to go with JAX over Flux, but I am planning to “squeeze in” some time to try out some of our guiding RL algorithms in Flux and benchmark the two approaches, especially in conjunction with the probabilistic programming layer.

I must confess to have not known Mjolnir beforehand. It looks really interesting and I will definitely test it out. There are currently quite a few ideas bubbling around in my hand which require quite complex neural networks with transformations, ideas which didn’t really work in the traditional frameworks and for which Mjolnir might be a really good shout - thank you so much!

Whatever RL/amortization algorithms I end up converting to Flux for testing purposes, I’d very happy to commit to the model zoo for others to use and iterate upon.

2 Likes

I’ll be following this with great interest. My lab published a number of papers on RSBI experiments a decade ago (before my time), and I’ve been eagerly awaiting development of a Julia-based hydro code. Am I right in reading that both the reaction model and the hydro code are to be written entirely in Julia? Can you describe in broad terms the approach you’re taking for hydro (DG/finite volume/compact finite difference)?

I am sorry to have to disappoint you in that regard. The hydro code is a “trusty” finite-volume Fortran code, which has proven itself in HPC deployments for the past decade and a half. A large part of these HPC projects is is the scalability and the “trackrecord” of the simulation engines as you only have one year to deploy your budget.

But just as you, I do see great potential for Julia simulation codes, especially when viewed against the backdrop of modern heterogeneous supercomputers and Julia’s outstanding differentiable programming frameworks such as Zygote.jl.

I believe Lawrence Berkeley Laboratory is doing some work in that direction, but I have not checked in with the people involved recently.

It would be great to leverage this opportunity to make Julia support on HPC systems better.

-viral

2 Likes

Thank you for the offer Viral! I hope we can use these (and possible future) opportunities to take as many strides as possible in that direction.

I’ll keep you guys updated on the project, its CPU, GPU & interconnect sub-parts and I’d be more than happy to provide detailed feedback to the Julia core team and to the Julia community in general.

And of course I’ll happily provide bug reports whenever any arise :slight_smile: