Hi,
I narrowed down the error, before, it was because there weren’t enough memory. Now the reported error is always the same: it always point to the use of sum in my code above, sum is applied to a distributed array. This usually occurs after some time though.
However, I don’t understand if EPERM causes the sum mistake or the other way around. Distributed debugging is hard
.
I would appreciate a hint please. So far, I am amazed by this code running on 250 processors thanks to DistributedArrays.
Best regards,
WARNING: Error trying to reuse client port number, falling back to plain socket : cannot obtain socket name: operation not permitted (EPERM)
Worker 30 terminated.
Worker 31 terminated.
...
[1] (::Base.Distributed.##99#100{TCPSocket,TCPSocket,Bool})() at ./event.jl:73
ERROR: LoadError: LoadError: ProcessExitedException()
(::DistributedArrays.##120#122{Base.#+,DistributedArrays.DArray{Float64,1,Array{Float64,1}},Array{Any,1}})() at ./task.jl:335
...and 31 more exception(s).
Stacktrace:
[1] sync_end() at ./task.jl:287
[2] macro expansion at ./task.jl:303 [inlined]
[3] reduce(::Function, ::DistributedArrays.DArray{Float64,1,Array{Float64,1}}) at /home/rveltz/.julia/v0.6/DistributedArrays/src/mapreduce.jl:40
[4] sum(::DistributedArrays.DArray{Float64,1,Array{Float64,1}}) at /home/rveltz/.julia/v0.6/DistributedArrays/src/mapreduce.jl:150
[5] macro expansion at /home/rveltz/prog/MF-dendrite/network-nds-parallel.jl:189 [inlined]
[6] macro expansion at ./util.jl:237 [inlined]
[7] #simule#92(::Float64, ::Bool, ::UnitRange{Int64}, ::Function, ::Array{Float64,1}, ::Int64, ::Bool) at /home/rveltz/prog/MF-dendrite/network-nds-parallel.jl:146
[8] (::#kw##simule)(::Array{Any,1}, ::#simule, ::Array{Float64,1}, ::Int64, ::Bool) at ./<missing>:0
while loading /home/rveltz/prog/MF-dendrite/network-nds-parallel.jl, in expression starting on line 237
while loading /home/rveltz/prog/MF-dendrite/avoid-bug.jl, in expression starting on line 3