Segmentation Fault with FFTW plans and multi-threading

I am trying to optimize a routine that performs FFTs on timeseries of various 3D boxes with fixed but different sizes (typically 256 - 512 cells per dimension).

For each box I thus had in mind to plan an FFT using FFTWs “plan_fft()” function and on top of that I was planning to use multithreading.

An example code could look like this

using FFTW # v.1.8.0

FFTW.set_num_threads(Threads.nthreads()) # In my case 32

FFTs = [plan_fft(randn(Float32, rand(256:512), rand(256:512), rand(256:512)), flags=FFTW.MEASURE) for _ in 1:32]

This code runs perfectly fine on a single thread, but with multithreading it throws a segmentation fault after a seemingly random number of iterations:

[1599023] signal (11.1): Segmentation fault
in expression starting at REPL[7]:1
spawn_apply at /home/lromano/.julia/artifacts/e95ca94c82899616429924e9fdc7eccda275aa38/lib/ (unknown line)
#2 at /home/lromano/.julia/packages/FFTW/6nZei/src/providers.jl:58
unknown function (ip: 0x7c154ff2674f)
_jl_invoke at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:2894 [inlined]
ijl_apply_generic at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/gf.c:3076
jl_apply at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/julia.h:1982 [inlined]
start_task at /cache/build/builder-amdci4-6/julialang/julia-release-1-dot-10/src/task.c:1238
Allocations: 45645903 (Pool: 45640544; Big: 5359); GC: 16
Segmentation fault

Has anyone run into the same or similar issues with FFTW and knows how to fix them?

I need to add that my code was running perfectly fine until Thursday night and actually has already finished processing ~2/3 of my data. Then suddenly on Friday it stopped working and I now cannot even move beyond the “plan_fft” stage.
I tried running on a single thread over the weekend but it crashed when applying the FFT operator.

This looks similar to this FFTW.jl issue.

I have also run into this issue recently. One workaround which has been suggested is to use the MKL FFTW interface. I tried this some time ago and initially run into performance issues when using MKL and threads. If it helps, after some trial and error I’ve written here about the “proper” way of using the MKL provider which avoids these issues.