Help us understand our users!

Hi all!

Once every while we need to pitch Julia’s GPU programming capabilities with vendors like
NVIDIA. Doing so with technical arguments is easy enough, but companies are more easily
convinced by an overview of noteworthy users and their applications :slight_smile: However, as with any open-source project it’s sometimes hard to know who is using the software. That’s what this topic is
for: if you’re using Julia for GPU computing, please let us know! Additional details on
the kind of application, or the scale of deployment, would be very valuable. Feel free to contact me (@maleadt) on Slack or Zulip if you’re not comfortable / at liberty to post publicly.



I have used the GPU capabilities of FixedEffectModels.jl (GitHub - FixedEffects/FixedEffectModels.jl: Fast Estimation of Linear Models with IV and High Dimensional Categorical Variables) and it’s amazing. I use it for my dissertation but nothing published yet.

1 Like

I had a gifted Ms student, who has ported GitHub - pevnak/SumProductTransform.jl: An experimental implementation of sum-product networks with dense unitary transformations in leaves to work with GPUs, which allowed us (and mainly will allow us) to scale. The repo is here

On the other hand, we have failed (so far) to get a speedup in GitHub - CTUAvastLab/Mill.jl: Multiple Instance Learning Library is build on top of Flux.jl aimed to prototype flexible multi-instance learning models., but we probably did it very naively (i wish the former student will help).

1 Like

Julia, CUDA.jl, and KernelAbstractions.jl have allowed us to rewrite an ocean model in Julia: Oceananigans.jl. We’ve been able to develop the model very quickly in Julia while maintaining fast CPU and GPU performance.

Many users are interested in using Oceananigans since it’s easier to use than existing models and runs quickly on GPUs. Most users are coming from outside the Julia community and are using Julia and GPUs for the first time to use Oceananigans. Several users have already made significant contributions.

We’ve published a JOSS paper and have started maintaining a list of publications using Oceananigans which we hope will grow (we have several more in the pipeline).

Although a lot of science has been made possible by using a single GPU we’ve also started to scale up to multiple GPUs with MPI.jl aiming for 2-256+ GPUs to target really big problems (more on that at JuliaCon!).


We use Julia GPU (mostly CUDA.jl for now and in combination with MPI.jl enabling CUDA-aware MPI) in natural sciences (Earth and cryosphere science, Geodynamics). We develop:

  • ParallelStencil.jl which enables domain scientists to write architecture-agnostic high-level code for parallel high-performance stencil computations on GPUs (and CPUs);
  • ImplicitGlobalGrid.jl which renders the distributed parallelization of stencil-based GPU and CPU applications on a regular staggered grid almost trivial and enables close to ideal weak scaling of real-world applications on thousands of GPU.

The miniapp section lists some cool applications; in particular, the two-phase flow solver is the Julia version of the code used in these 2018 and 2020 papers (references to scientific talks are listed in the respective README). Some publications are currently in there pipeline.

We also used Julia GPU in this year’s EGU short course to resolve ice flow over Greenland and the geo-HPC online material I am teaching from time to time also regroups some useful references and hands-on.

Having AMDGPU.jl and Intel’s oneAPI.jl as backends in ParallelStencil.jl is a plan at some point.


I’ve implemented a ray tracing (mainly ray casting) rendered in Julia that uses CUDA.jl kernels for both scene sampling and for post-processing filtering. This renderer is automatically differentiable with ForwardDiff.jl while still being CUDA and CPU compatible. I currently use Tullio.jl to organize the parallelization.


I am not yet a CUDA.jl user in earnest, but I am looking forward to using it for some streaming DSP applications on large scale radio telescope arrays.


At Beacon Biosignals, we are using the Julia GPU stack for training and inference of deep learning models on EEG datasets that can be > 1Tb, in an industry setting. Most often things are running in docker containers in a cloud somewhere.


We’re cosmologists using CUDA.jl to analyze how the light given off by the big bang (the so-called Cosmic Microwave Background) has been distorted due to gravitational lensing very slightly bending its trajectory as it has been traveling to us across the universe over the last ~13 billion years. This lets us infer properties of the matter and dark matter in the universe, as well as the details of the universe’s expansion and geometry.

The analysis boils down to a high-dimensional hierarchical Bayesian inference problem (~1-10million dimensions) which we explore using Hamiltonian Monte-Carlo running multiple batched MCMC chains on GPU with CUDA.jl, with the necessary gradients provided by automatic differentiation. Speedups over CPU can be 10-100x, and its fair to say the analysis would be impossible without GPU.

We run the MCMC chains mostly at the computing center NERSC generally on ~1-16 Tesla V100s at a time. We’re excited about NERSC’s soon-to-be pre-exascale GPU-heavy system, Perlmutter, that should allow us to analyze much larger datasets, of the kind that next generation Cosmic Microwave Background experiments will be producing. (Btw @jblaschke has a poster at JuliaCon about Julia+CUDA+Perlmutter which you should check out!)

The code is CMBLensing.jl and some recent publications include the theory and application to data from the South Pole Telescope. Happy to provide anything else that might be useful!


CUDA.jl (CUBLAS) used in MetidaCu.jl - this is solver for Metida.jl to fit mixed-effect models.

DynamicGrids.jl optionally uses CUDA.jl as a backend to run user-defined dynamic spatial simulations. It was written for ecological and agricultural modelling, but is generally useful for Cellular Automata and similar models. We should have a paper out soon.

1 Like

I’m an astronomer using some CUDA.jl code in conjunction with threaded CPU code to analyze radio interferometry data, primarily from the National Radio Astronomy Observatory’s (NRAO) Very Large Array (VLA). Data files from the VLA are typically in the range of tens to hundreds of GBs in size. The calculations aren’t complex, there is just a lot of data to process. It’s nice to have the option to off-load some of the processing to the GPU, so the CPU can do other tasks. The primary limitation to using the GPU is the memory. 32 or 64 GB or more would make life a lot easier.

1 Like