Catching ERROR (unhandled task failure): EOFError: read end of file


#1

Hello,

i am asynchronously reading data in several processes and then after being loaded and preprocessed, I copy them to the master process. Unfortunately I sometimes get

ERROR (unhandled task failure): EOFError: read end of file
ERROR: LoadError: ProcessExitedException()

Which completely crashes my calculation. Is there a way, how to catch this as an exception? Simply wrapping that in try - catch block does not seem to work.
Thanks for suggestions.

Tomas


#2

I am in the same sort of situation. Have you made any progress?


#3

@Tomas_Pevny, I got around my issue by wrapping my parallel section in a function.

Below is a minimal example of my original program structure.

iteration = 1
max_iterations = 1000

weights = Array{Float64,2}(2,N)

while iteration <= max_iterations

    arr = SharedArray{Float64}(N)

    @sync @parallel for i=1:N

        # Compute value to place in the shared array.
        value = something(weights[end,:])

        arr[i] = value

    end # parallel sync

    push!(weights, sdata(arr));
    
    # Check for early breaks.
    if hasConverged(arr)
        break
    end

    iteration = iteration + 1
end # while

Below is a minimal example of the structural change that seems to allow my program to run without running into the EOFError.

function calculation(weight)

    arr = SharedArray{Float64}(N)
    
    @sync @parallel for i=1:N

        # Compute value to place in shared array.
        value = something(weight)

        arr[i] = value

    end # parallel sync

    return arr
end

iteration = 1
max_iterations = 1000

weights = Array{Float64,2}(2,N)

while iteration <= max_iterations

    arr = calculation(weights[end,:])

    push!(weights, sdata(arr));

    ## Check for early breaks.
    if hasConverged(arr)
        break
    end

    iteration = iteration + 1
end # while

If anybody knows how this change in structure affects scoping/garbage collection, please let me know.