FFT plan can't be sent between processes?


#1

I’m not fully sure if this is a bug, something where just the error message ought to be nicer, or whether I’ve done something really bad! The problem arises trying to transfer an FFT plan between processes as you see below. Any advice how to proceed (file bug report, workaround, etc…) appreciated.

$ julia -p 1
               _
   _       _ _(_)_     |  A fresh approach to technical computing
  (_)     | (_) (_)    |  Documentation: http://docs.julialang.org
   _ _   _| |_  __ _   |  Type "?help" for help.
  | | | | | | |/ _  |  |
  | | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)
 _/ |\__'_|_|_|\__'_|  |  Official http://julialang.org/ release
|__/                   |  x86_64-pc-linux-gnu

julia> @fetch plan_fft(zeros(2))
FFTW forward plan for 2-element array of Complex{Float64}
signal (11): Segmentation fault
while loading no file, in expression starting on line 0
fftw_sprint_plan at /home/marius/src/julia-3c9d75391c/bin/../lib/julia/libfftw3.so.3 (unknown line)
sprint_plan at ./fft/FFTW.jl:285 [inlined]
show at ./fft/FFTW.jl:292
unknown function (ip: 0x7fe604097016)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
display at ./REPL.jl:132
unknown function (ip: 0x7fe604096d46)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
display at ./REPL.jl:135
unknown function (ip: 0x7fe604096a66)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
display at ./multimedia.jl:143
unknown function (ip: 0x7fe6040968b2)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
print_response at ./REPL.jl:154
unknown function (ip: 0x7fe604096308)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
print_response at ./REPL.jl:139
unknown function (ip: 0x7fe604095d88)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
#22 at ./REPL.jl:652
unknown function (ip: 0x7fe60408a331)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
run_interface at ./LineEdit.jl:1579
unknown function (ip: 0x7fe80a93d0bf)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
run_frontend at ./REPL.jl:903
run_repl at ./REPL.jl:188
unknown function (ip: 0x7fe604081c52)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
_start at ./client.jl:360
unknown function (ip: 0x7fe80a9582e8)
jl_call_method_internal at /home/centos/buildbot/slave/package_tarball64/build/src/julia_internal.h:189 [inlined]
jl_apply_generic at /home/centos/buildbot/slave/package_tarball64/build/src/gf.c:1942
unknown function (ip: 0x4018ed)
unknown function (ip: 0x4013b6)
__libc_start_main at /build/glibc-Qz8a69/glibc-2.23/csu/../csu/libc-start.c:291
unknown function (ip: 0x4013fc)
Allocations: 3205304 (Pool: 3204360; Big: 944); GC: 3
Segmentation fault (core dumped)

#2

I’m not sure what is going wrong, but I can confirm that this fails for me too on both 0.5 and master.


#3

For me, Everything seems ok in windows.

             _
 _       _ _(_)_     |  A fresh approach to technical computing
(_)     | (_) (_)    |  Documentation: http://docs.julialang.org
 _ _   _| |_  __ _   |  Type "?help" for help.
| | | | | | |/ _` |  |
| | |_| | | | (_| |  |  Version 0.5.0 (2016-09-19 18:14 UTC)

/ |_|||_’_| | Official http://julialang.org/ release
|__/ | x86_64-w64-mingw32

julia> @fetch plan_fft(zeros(2))
FFTW forward plan for 2-element array of Complex{Float64}
(dft-direct-2 “n1fv_2_avx”)

julia> versioninfo()
Julia Version 0.5.0
Commit 3c9d753 (2016-09-19 18:14 UTC)
Platform Info:
System: NT (x86_64-w64-mingw32)
CPU: Intel® Core™ i7-4770 CPU @ 3.40GHz
WORD_SIZE: 64
BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
LAPACK: libopenblas64_
LIBM: libopenlibm
LLVM: libLLVM-3.7.1 (ORCJIT, haswell)


#4

It’s expected and should only fail when you have more than one process.

It’s caused by the pointer in the plan. It doesn’t actually make much sense to serialize a fftw plan in general. The way to workaround this would be to add NULL pointer check in the FFTW code. Hopefully it doesn’t add too much overhead.


#5

It’s pretty hard to imagine a single null pointer check introducing much overhead compared to FFT computations.


#6

I’m not sure where exactly you want the check, or what you want it to do if it detects a NULL pointer?


#7

Is the data being pointed to such that it doesn’t make sense, or is impossible, that that could be transferred as well? I mostly agree that its not hugely consequential if you have to compute the plan on every processor once, but if you have the plan nested somewhere inside some data types, it kills your ability to use the datatype in parallel computations and makes things quite inconvenient. I’m not too familiar, is there some mechanism to define custom (de/)serialization that might be used in this case?


#8

Couldn’t you check that the pointer is NULL and throw a normal Julia error instead of trying to call FFTW and get a segfault? We do these kinds of checks in the SuiteSparse all the time.


#9

What I usually do is throw a UndefRefError in unsafe_convert.


#10

The plan is in general a machine specific property and should probably be computed on each process independently.


#11

It is really not practical to try to serialize the low-level FFTW plan.

What should be possible (in FFTW.jl in Julia) would be to to:

  • Define a custom serialization for the FFTWPlan subtypes that just converts the plan::PlanPtr field into C_NULL when serializing.

  • In the functions like A_mul_B! that call unsafe_execute! on the plan, first check if plan == C_NULL and, if so, re-create the plan (and save it for future calls). This is possible because the FFTWPlan subtypes contain all of the information needed to re-create the plan. (Because planning overwrites the input/output arrays, so temporary input/output array would have to be created. Alternatively, the serialized plan could be re-created with the FFTW.ESTIMATE flag, which has the advantage that the plan will be created relatively quickly and without overwriting the input array. The resulting plan will be slower, however, although this could be fixed by exporting the “wisdom” [cached planning info] from one process and importing it on the other process, assuming they are running the same architecture.)


#12

The default one should be doing this already.