FFTW with @parallel seg faults

hi all,

I have a code that I’m trying to trivially parallelize with @parallel, but I’m getting a strange segmentation fault that I’ve tracked down to a call to FFTW.plan_r2r. This is the simplest example that reproduces the issue:

using FFTW

M = 17
x = -cos.(pi*(0:M)/M)
f = x.^2

fft_plan = FFTW.plan_r2r(x, FFTW.REDFT00)

@parallel for _ in 1:4
    fft_fp = fft_plan * f
end

running this with julia 0.6.2 results in a segfault:

signal (11): Segmentation fault
while loading no file, in expression starting on line 0
unknown function (ip: 0x7f7c02d1917f)
unknown function (ip: 0x7f7c07535c56)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
macro expansion at ./REPL[7]:2 [inlined]
#3 at ./distributed/macros.jl:174
#158 at ./distributed/macros.jl:20
unknown function (ip: 0x7f7c075359bf)

Am I doing something wrong?

You have to create the plan on every worker; there’s no way to send an FFTW plan over the network.

(Would be better if this gave a sensible error rather than segfaulting. Maybe if it threw an error during serialization, or serialized to an object that threw an error?)

1 Like

Thanks for the reply. Creating the plan on every worker seems to work, but isn’t this very inefficient? The plan is the same for all of them; isn’t one of the advantages of the FFTW the fact that the plan can be created once and for all?

In the actual code I want to run, I’m doing a time evolution and the fft plan never changes, so it seems that having to compute it for all workers (and at every time step) to add a lot of overhead… What would be the best way of doing this?

(and it would indeed be better if the example above would return some error instead of segfaulting…)

Why at every time step? Why not something like

@everywhere begin
M = 17
x = -cos.(pi*(0:M)/M)
f = x.^2

fft_plan = FFTW.plan_r2r(x, FFTW.REDFT00)
end

now each worker will have a copy of fft_plan that you can reuse on that worker.

This approach still gives me a segfault…

@everywhere begin
M = 17
x = -cos.(pi*(0:M)/M)
f = x.^2
const fft_plan = FFTW.plan_r2r(x, FFTW.REDFT00)
local_mul(f) = fft_plan*f
end
@parallel for _ in 1:4 
    fft_fp = local_mul(f)
end
4-element Array{Future,1}:
 Future(2, 1, 14, #NULL)
 Future(3, 1, 15, #NULL)
 Future(4, 1, 16, #NULL)
 Future(5, 1, 17, #NULL)

You’ll have to actually use the local copies of the fft_plan. I don’t know what your use case looks like, so I left that part out, but the above is a simple example that does not segfault, but you’ll have to do something to actually access the results (calling fetch on the futures just gives nothing).
Something like this works:

julia> remotecall_fetch(() ->  local_mul(f), 2)
18-element Array{Float64,1}:
 17.0        
  4.3844e-16 
  8.5        
 -6.72674e-16
  1.01721e-15
 -1.76831e-16
  8.01797e-16
 -5.81256e-16
 -5.75851e-16
  2.98295e-16
 -3.06922e-16
  7.52515e-16
 -2.67259e-16
  3.15053e-16
  2.28585e-16
 -8.88178e-16
 -4.3844e-16 
  0.0    
2 Likes

Thanks for the help. This construction seems a bit too elaborate for my actual use case, however… I actually need to perform lots of function calls, and the f function above will change inside the for loop, so I’m not sure I can neatly implement something like this…

If there is no other way of accomplishing this with @parallel, maybe I should be looking into Threads instead? My use case is one of those that can be trivially parallelized with OpenMP, but unfortunately just adding Threads.@threads to the loop made the performance much worse (and this is way I was giving @parallel a try instead)…

I am struggling to understand why this code still returns a segfault error as a FFT plan is created for each worker?