Error in a parallel loop in Julia 0.7 rc2

parallel

#1

Since Julia version 0.7.0-rc2, I have an error in a parallel loop.
Unfortunately, I am not able to narrow this error down to a minimum code snippet.

The error message (from a clean julia restart) is every time different. Here is an example:

  Closest candidates are:
    +(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:502
  add_sum(::Type, ::Core.Compiler.Const) at ./reduce.jl:21
  _mapreduce(::typeof(identity), ::typeof(Base.add_sum), ::IndexLinear, ::Array{Any,1}) at ./reduce.jl:311
  _mapreduce_dim at ./reducedim.jl:305 [inlined]
  #mapreduce#542 at ./reducedim.jl:301 [inlined]
  mapreduce at ./reducedim.jl:301 [inlined]
  _sum at ./reducedim.jl:650 [inlined]
  _sum at ./reducedim.jl:649 [inlined]
  #sum#544 at ./reducedim.jl:645 [inlined]
  sum at ./reducedim.jl:645 [inlined]
  #DIVAndjog#190(::Array{Core.LineInfoNode,1}, ::Base.Iterators.Pairs{Symbol,Any,NTuple{5,Symbol},NamedTuple{(:moddim, :MEMTOFIT, :QCMETHOD, :RTIMESONESCALES, :velocity),Tuple{Array{Float64,1},Int64,Tuple{},Tuple{},Tuple{}}}}, ::Function, ::BitArray{3}, ::Tuple{Array{Float64,3},Array{Float64,3},Array{Float64,3}}, ::Tuple{Array{Float64,3},Array{Float64,3},Array{Float64,3}}, ::Tuple{Array{Float64,1},Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Tuple{Array{Float64,3},Array{Float64,3},Array{Float64,3}}, ::Array{Float64,1}, ::Array{Any,1}, ::Array{Any,1}, ::Int64) at /home/abarth/projects/Julia/divand.jl/src/DIVAndjog.jl:36

Or this:

  LoadError: ReadOnlyMemoryError()
   ... (the last 2 lines are repeated 1 more time)
   ... (the last 2 lines are repeated 1 more time)
   ... (the last 2 lines are repeated 8 more times)
   ... (the last 20 lines are repeated 1 more time)
   ... (the last 2 lines are repeated 13 more times)

Or just:

Speicherzugriffsfehler (which translates to memory access error) without any Julia error message.

The parallel loop in question looks like this:

    @sync @parallel for iwin = 1:size(windowlist,1)
    [...]
    end

In my test I run the parallel loop with only 1 CPU. The error persists if I replace @parallel by @distributed, but it does away when I make to loop serial (i.e. delete @sync @parallel).

The error is not 100% reproducible. Sometimes (maybe 1 out of 10) the code passes.

Here is a link the error on travis with the full message:

https://travis-ci.org/gher-ulg/DIVAnd.jl/jobs/412384891#L905

It uses the branch Alex-0.7 from https://github.com/gher-ulg/DIVAnd.jl . The error is triggered by the script test_product.jl called by runtest.jl.

The last time where the parallel loop worked was 0.7.0-rc1.3 (Commit fe9a0752aa) and the first time I saw this issue was on
0.7.0-rc1.14 (Commit ea5871a804). For what it is worth, here are the changes between those version:


#2

I also get these kinds of errors:

GC error (probable corruption) :
Allocations: 64456802 (Pool: 64443516; Big: 13286); GC: 144
<?#0x7f863b07a580::(nil)>
0x7f864c249010: Queued root: 0x7f865d7488d0 :: 0x7f865a834c50 (bits: 3)
        of type Core.TypeMapEntry
0x7f864c249028: Queued root: 0x7f863b39d590 :: 0x7f865a835c10 (bits: 3)
        of type Core.MethodInstance
0x7f864c249040: Queued root: 0x7f863b39db90 :: 0x7f865a835c10 (bits: 3)
        of type Core.MethodInstance
0x7f864c249058: Queued root: 0x7f863b39e290 :: 0x7f865a835c10 (bits: 3)
        of type Core.MethodInstance
0x7f864c249070: Queued root: 0x7f863b39ea10 :: 0x7f865a835c10 (bits: 3)
        of type Core.MethodInstance
[...]

Maybe it is a memory error in Julia?


#3

This bug is fixed in Julia 0.7.0 rc 3.