FFT plan can't be sent between processes?

marius311 · December 12, 2016, 1:41am

I’m not fully sure if this is a bug, something where just the error message ought to be nicer, or whether I’ve done something really bad! The problem arises trying to transfer an FFT plan between processes as you see below. Any advice how to proceed (file bug report, workaround, etc…) appreciated.

$ julia -p 1
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _  |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia> @fetch plan_fft(zeros(2))
FFTW forward plan for 2-element array of Complex{Float64}
signal (11): Segmentation fault
while loading no file, in expression starting on line 0
fftw_sprint_plan at /home/marius/src/julia-3c9d75391c/bin/../lib/julia/libfftw3.so.3 (unknown line)
sprint_plan at ./fft/FFTW.jl:285 [inlined]
show at ./fft/FFTW.jl:292
unknown function (ip: 0x7fe604097016)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
display at ./REPL.jl:132
unknown function (ip: 0x7fe604096d46)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
display at ./REPL.jl:135
unknown function (ip: 0x7fe604096a66)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
display at ./multimedia.jl:143
unknown function (ip: 0x7fe6040968b2)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
print_response at ./REPL.jl:154
unknown function (ip: 0x7fe604096308)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
print_response at ./REPL.jl:139
unknown function (ip: 0x7fe604095d88)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
#22 at ./REPL.jl:652
unknown function (ip: 0x7fe60408a331)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
run_interface at ./LineEdit.jl:1579
unknown function (ip: 0x7fe80a93d0bf)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
run_frontend at ./REPL.jl:903
run_repl at ./REPL.jl:188
unknown function (ip: 0x7fe604081c52)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
_start at ./client.jl:360
unknown function (ip: 0x7fe80a9582e8)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
unknown function (ip: 0x4018ed)
unknown function (ip: 0x4013b6)
__libc_start_main at /build/glibc-Qz8a69/glibc-2.23/csu/../csu/libc-start.c:291
unknown function (ip: 0x4013fc)
Allocations: 3205304 (Pool: 3204360; Big: 944); GC: 3
Segmentation fault (core dumped)

adamslc · December 12, 2016, 3:44am

I’m not sure what is going wrong, but I can confirm that this fails for me too on both 0.5 and master.

111 · December 12, 2016, 4:56pm

For me, Everything seems ok in windows.

             _
 _       _ _(_)_     |  A fresh approach to technical computing
(_)     | (_) (_)    |  Documentation: http://docs.julialang.org
 _ _   _| |_  __ _   |  Type "?help" for help.
| | | | | | |/ _` |  |
| | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)

/ |_‘|||_’_| | Official http://julialang.org/ release
|__/ | x86_64-w64-mingw32

julia> @fetch plan_fft(zeros(2))
FFTW forward plan for 2-element array of Complex{Float64}
(dft-direct-2 “n1fv_2_avx”)

julia> versioninfo()
Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
System: NT (x86_64-w64-mingw32)
CPU: Intel(R) Core™ i7-4770 CPU @ 3.40GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, haswell)

yuyichao · December 12, 2016, 5:03pm

It’s expected and should only fail when you have more than one process.

It’s caused by the pointer in the plan. It doesn’t actually make much sense to serialize a fftw plan in general. The way to workaround this would be to add NULL pointer check in the FFTW code. Hopefully it doesn’t add too much overhead.

StefanKarpinski · December 12, 2016, 5:05pm

It’s pretty hard to imagine a single null pointer check introducing much overhead compared to FFT computations.

stevengj · December 12, 2016, 6:31pm

I’m not sure where exactly you want the check, or what you want it to do if it detects a NULL pointer?

marius311 · December 12, 2016, 6:39pm

Is the data being pointed to such that it doesn’t make sense, or is impossible, that that could be transferred as well? I mostly agree that its not hugely consequential if you have to compute the plan on every processor once, but if you have the plan nested somewhere inside some data types, it kills your ability to use the datatype in parallel computations and makes things quite inconvenient. I’m not too familiar, is there some mechanism to define custom (de/)serialization that might be used in this case?

andreasnoack · December 12, 2016, 7:33pm

Couldn’t you check that the pointer is NULL and throw a normal Julia error instead of trying to call FFTW and get a segfault? We do these kinds of checks in the SuiteSparse all the time.

yuyichao · December 12, 2016, 7:56pm

What I usually do is throw a UndefRefError in unsafe_convert.

yuyichao · December 12, 2016, 7:59pm

The plan is in general a machine specific property and should probably be computed on each process independently.

stevengj · December 12, 2016, 9:46pm

It is really not practical to try to serialize the low-level FFTW plan.

What should be possible (in FFTW.jl in Julia) would be to to:

Define a custom serialization for the FFTWPlan subtypes that just converts the plan::PlanPtr field into C_NULL when serializing.
In the functions like A_mul_B! that call unsafe_execute! on the plan, first check if plan == C_NULL and, if so, re-create the plan (and save it for future calls). This is possible because the FFTWPlan subtypes contain all of the information needed to re-create the plan. (Because planning overwrites the input/output arrays, so temporary input/output array would have to be created. Alternatively, the serialized plan could be re-created with the FFTW.ESTIMATE flag, which has the advantage that the plan will be created relatively quickly and without overwriting the input array. The resulting plan will be slower, however, although this could be fixed by exporting the “wisdom” [cached planning info] from one process and importing it on the other process, assuming they are running the same architecture.)

yuyichao · December 12, 2016, 10:14pm

The default one should be doing this already.

Topic		Replies	Views
Calling FFTW on other processes resulting ProcessExitException() General Usage question , fftw , distributed	1	704	September 11, 2018
FFTW with @parallel seg faults Julia at Scale fftw	7	1454	January 11, 2024
Distributing a function that uses FFTW General Usage fftw , distributed	1	496	October 11, 2021
`plan_fft` fails when Julia is built with Intel MKL? Internals & Design fftw	6	2232	October 1, 2017
FFTW Plans for multiple threads Julia at Scale fftw , multithreading	1	2498	March 14, 2019

FFT plan can't be sent between processes?

Related topics