Code snippet for multiGPU fft

rveltz · July 5, 2019, 8:02am

Hi,

I would like to perform a 3d FFT on multi GPUs. It can be done in cuda as explained here. However the functions required do not seem ported in CuArrays. I was wondering if anybody has ever done this in Julia?

A second question concerns the vectorial structure. If my array x is spread on two GPUs, I would like to be able to do things like x .= f.(x) while still doing this. But this might be too difficult to do for now…

Best,

maleadt · July 9, 2019, 8:37am

cufftXt is not wrapped indeed, so we don’t have convenient multi-GPU FFT functionality right now. Adding wrappers (just the wrappers, not the high-level functionality to make this a proper Julian API) is a pretty easy task though, so if you need this functionality you could look into providing the necessary wrappers yourself See e.g. Initial work towards cublasXt wrappers by kshyatt · Pull Request #294 · JuliaGPU/CuArrays.jl · GitHub where similar cublasXt functionality was added. You can also open an issue on the CuArrays repo.

Similarly, we don’t have an convenient multi-GPU abstraction that allows multi-GPU broadcasting. There’s some initial support for using DistributedArrays.jl with CuArrays.jl, i.e. by using DArray{CuArray}, but there’s many inefficiencies to be fixed there before this is a usable solution for multi-GPU computations.

rveltz · July 9, 2019, 3:04pm

I looked at it but I could not find a simple example of how to use it. Do you have such example? It seems to me that cublasXt would do the multi-GPU broadcasting.

I will try to provide some of the wrappers of cufftXt.

maleadt · July 10, 2019, 6:16am

There’s some _xt testsets in CuArrays/test/blas.jl, but nothing actually documented. Those APIs aren’t sufficient to implement broadcasting, since we need to be able to compile and launch kernels for the general case (so we’d need to deal with multi-GPU indexing ourselves).

rveltz · July 12, 2019, 1:01pm

I’d be willing to try this if I knew whether the cufftXt part allows to compute fft of arrays that do not fit in a single GPU. So far, I have not found an example of this…

wsshin · July 19, 2022, 5:07pm

I’m interested in performing FFT on multiple GPUs and found this thread. I’m curious if there has been any progress on this issue since the OP was posted two years ago.

maleadt · July 19, 2022, 9:05pm

Not on CUDA.jl at least.

gwisk · February 25, 2025, 10:37pm

Hate to necro an old thread, but curious if there’s been any updates to CUDA.jl towards multi-GPU FFTs?

maleadt · March 3, 2025, 7:33am

There’s now low-level cufftXt wrappers in CUDA.jl, so you can use those directly already, ideally to build up some higher level wrappers for inclusion in CUDA.jl. For inspiration, you can look at how cublasXt is wrapped.

Topic		Replies	Views
Usage of CUDA.CUFFT.cufftPlanMany GPU fft	1	816	August 30, 2022
Map Performance with CuArrays GPU question , fftw , cuda , broadcast	15	5187	January 4, 2021
CUDA.jl crashes if a 4d FFT is asked GPU fft	2	541	April 7, 2023
Directly applying CuFFT on StructArrays GPU	1	451	February 25, 2021
Package use, CUDA stream support, etc GPU first-steps	5	1462	September 13, 2018

Code snippet for multiGPU fft

Related topics