Since Julia version 0.7.0-rc2, I have an error in a parallel loop.
Unfortunately, I am not able to narrow this error down to a minimum code snippet.
The error message (from a clean julia restart) is every time different. Here is an example:
Closest candidates are:
+(::Any, ::Any, !Matched::Any, !Matched::Any...) at operators.jl:502
add_sum(::Type, ::Core.Compiler.Const) at ./reduce.jl:21
_mapreduce(::typeof(identity), ::typeof(Base.add_sum), ::IndexLinear, ::Array{Any,1}) at ./reduce.jl:311
_mapreduce_dim at ./reducedim.jl:305 [inlined]
#mapreduce#542 at ./reducedim.jl:301 [inlined]
mapreduce at ./reducedim.jl:301 [inlined]
_sum at ./reducedim.jl:650 [inlined]
_sum at ./reducedim.jl:649 [inlined]
#sum#544 at ./reducedim.jl:645 [inlined]
sum at ./reducedim.jl:645 [inlined]
#DIVAndjog#190(::Array{Core.LineInfoNode,1}, ::Base.Iterators.Pairs{Symbol,Any,NTuple{5,Symbol},NamedTuple{(:moddim, :MEMTOFIT, :QCMETHOD, :RTIMESONESCALES, :velocity),Tuple{Array{Float64,1},Int64,Tuple{},Tuple{},Tuple{}}}}, ::Function, ::BitArray{3}, ::Tuple{Array{Float64,3},Array{Float64,3},Array{Float64,3}}, ::Tuple{Array{Float64,3},Array{Float64,3},Array{Float64,3}}, ::Tuple{Array{Float64,1},Array{Float64,1},Array{Float64,1}}, ::Array{Float64,1}, ::Tuple{Array{Float64,3},Array{Float64,3},Array{Float64,3}}, ::Array{Float64,1}, ::Array{Any,1}, ::Array{Any,1}, ::Int64) at /home/abarth/projects/Julia/divand.jl/src/DIVAndjog.jl:36
Or this:
LoadError: ReadOnlyMemoryError()
... (the last 2 lines are repeated 1 more time)
... (the last 2 lines are repeated 1 more time)
... (the last 2 lines are repeated 8 more times)
... (the last 20 lines are repeated 1 more time)
... (the last 2 lines are repeated 13 more times)
Or just:
Speicherzugriffsfehler (which translates to memory access error) without any Julia error message.
The parallel loop in question looks like this:
@sync @parallel for iwin = 1:size(windowlist,1)
[...]
end
In my test I run the parallel loop with only 1 CPU. The error persists if I replace @parallel
by @distributed
, but it does away when I make to loop serial (i.e. delete @sync @parallel
).
The error is not 100% reproducible. Sometimes (maybe 1 out of 10) the code passes.
Here is a link the error on travis with the full message:
It uses the branch Alex-0.7 from GitHub - gher-uliege/DIVAnd.jl: DIVAnd performs an n-dimensional variational analysis of arbitrarily located observations . The error is triggered by the script test_product.jl called by runtest.jl.
The last time where the parallel loop worked was 0.7.0-rc1.3 (Commit fe9a0752aa) and the first time I saw this issue was on
0.7.0-rc1.14 (Commit ea5871a804). For what it is worth, here are the changes between those version: