FFTW with @parallel seg faults

fftw

#1

hi all,

I have a code that I’m trying to trivially parallelize with @parallel, but I’m getting a strange segmentation fault that I’ve tracked down to a call to FFTW.plan_r2r. This is the simplest example that reproduces the issue:

using FFTW

M = 17
x = -cos.(pi*(0:M)/M)
f = x.^2

fft_plan = FFTW.plan_r2r(x, FFTW.REDFT00)

@parallel for _ in 1:4
    fft_fp = fft_plan * f
end

running this with julia 0.6.2 results in a segfault:

signal (11): Segmentation fault
while loading no file, in expression starting on line 0
unknown function (ip: 0x7f7c02d1917f)
unknown function (ip: 0x7f7c07535c56)
jl_call_fptr_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:339 [inlined]
jl_call_method_internal at /buildworker/worker/package_linux64/build/src/julia_internal.h:358 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:1926
macro expansion at ./REPL[7]:2 [inlined]
#3 at ./distributed/macros.jl:174
#158 at ./distributed/macros.jl:20
unknown function (ip: 0x7f7c075359bf)

Am I doing something wrong?


#2

You have to create the plan on every worker; there’s no way to send an FFTW plan over the network.

(Would be better if this gave a sensible error rather than segfaulting. Maybe if it threw an error during serialization, or serialized to an object that threw an error?)


#3

Thanks for the reply. Creating the plan on every worker seems to work, but isn’t this very inefficient? The plan is the same for all of them; isn’t one of the advantages of the FFTW the fact that the plan can be created once and for all?

In the actual code I want to run, I’m doing a time evolution and the fft plan never changes, so it seems that having to compute it for all workers (and at every time step) to add a lot of overhead… What would be the best way of doing this?

(and it would indeed be better if the example above would return some error instead of segfaulting…)


#4

Why at every time step? Why not something like

@everywhere begin
M = 17
x = -cos.(pi*(0:M)/M)
f = x.^2

fft_plan = FFTW.plan_r2r(x, FFTW.REDFT00)
end

now each worker will have a copy of fft_plan that you can reuse on that worker.


#5

This approach still gives me a segfault…


#6
@everywhere begin
M = 17
x = -cos.(pi*(0:M)/M)
f = x.^2
const fft_plan = FFTW.plan_r2r(x, FFTW.REDFT00)
local_mul(f) = fft_plan*f
end
@parallel for _ in 1:4 
    fft_fp = local_mul(f)
end
4-element Array{Future,1}:
 Future(2, 1, 14, #NULL)
 Future(3, 1, 15, #NULL)
 Future(4, 1, 16, #NULL)
 Future(5, 1, 17, #NULL)

You’ll have to actually use the local copies of the fft_plan. I don’t know what your use case looks like, so I left that part out, but the above is a simple example that does not segfault, but you’ll have to do something to actually access the results (calling fetch on the futures just gives nothing).
Something like this works:

julia> remotecall_fetch(() ->  local_mul(f), 2)
18-element Array{Float64,1}:
 17.0        
  4.3844e-16 
  8.5        
 -6.72674e-16
  1.01721e-15
 -1.76831e-16
  8.01797e-16
 -5.81256e-16
 -5.75851e-16
  2.98295e-16
 -3.06922e-16
  7.52515e-16
 -2.67259e-16
  3.15053e-16
  2.28585e-16
 -8.88178e-16
 -4.3844e-16 
  0.0    

#7

Thanks for the help. This construction seems a bit too elaborate for my actual use case, however… I actually need to perform lots of function calls, and the f function above will change inside the for loop, so I’m not sure I can neatly implement something like this…

If there is no other way of accomplishing this with @parallel, maybe I should be looking into Threads instead? My use case is one of those that can be trivially parallelized with OpenMP, but unfortunately just adding Threads.@threads to the loop made the performance much worse (and this is way I was giving @parallel a try instead)…