How do I figure out where things went wrong in this parallel code?

question
debug
parallel

#1

I have the following code (simplified) that I run in a julia -p 4 REPL:

println("computing betweenness centrality - method1")
rbc1 = @spawn betweenness_centrality(g, ...)

println("computing betweenness centrality - method2")
rbc2 = @spawn betweenness_centrality(g, ...) # some other set of params

...

bc1 = fetch(rbc1)
bc2 = fetch(rbc2)

and in the middle of this, I get

julia> cdict = compute_centralities("myfile.csv");
computing betweenness centrality - method1
computing betweenness centrality - method2
fatal error on
julia> 

That’s it. No other error text. During the computation I see expected the CPU load, and I’m not running out of memory. The code runs fine serially (but it takes a long time, which is why I’d like to parallelize it).

How do I troubleshoot this?

Edited to add:

Julia Version 0.6.0-pre.beta.325
Commit 6e0a2f8c94 (2017-04-25 14:57 UTC)
Platform Info:
  OS: macOS (x86_64-apple-darwin16.5.0)
  CPU: Intel(R) Core(TM) i5-6267U CPU @ 2.90GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.9.1 (ORCJIT, skylake)

#2

Can you post a link to the code that is called?


#3

The betweenness centrality? It’s standard LightGraphs.


#4

Do you get the full output if you don’t run the code in the repl?

One alternative is to attach gdb to the started processes (or even better rr from mozilla) to see what happens.