Task serialization error: Running a Task on a remote process fails with "cannot serialize a running Task"

Hi all,

I have written a routine to do image acquisition. This routine calls camera C APIs to take and save images. I tested it by running the routine on a local process and it works fine; I can send SIGINT to interrupt the loop and it quits nicely. However, when I try to send it to run on a worker process, it fails with an error message cannot serialize a running Task

Here is the routine code

function work(camera::Camera, remcam::RemoteCamera)
    Base.exit_on_sigint(false)
    # set acquisition mode
    set_acquisitionmode(camera, "Continuous")

    # attach shared arrays in a remote process
    img_array, imgTime_array = attach_remote_process()

    # begin acquisition
    start(camera)
    counter = 0
    
    try
        while true
            #get image
            img =
            try
                SpinnakerCameras.next_image(camera, 1)
            catch ex

                if (!isa(ex, SpinnakerCameras.CallError) ||
                    ex.code != SpinnakerCameras.SPINNAKER_ERR_TIMEOUT)
                    rethrow(ex)
                else
                    @warn "image corrupted"
                end
                nothing
            end
            
            if img.incomplete == 1 && img.status != 0
               @goto clear_img
            end

            counter +=1
            img_data = @view img.data[:,:]
            ts = img.timestamp
         
           # lock shared arrays and write data to the shared arrays
            wrlock(img_array,1.0) do
                copyto!(img_array, img_data)
            end

            wrlock(imgTime_array,1.0) do
                imgTime_array[1] = ts
            end

            # clear image handle 
            @label clear_img
                finalize(img)

        end
    catch e
        if e isa InterruptException
            
            @info "Acquisition loop is terminated"

            try
                finalize(img)
            catch e
                if !(e isa UndefVarError)
                    rethrow(e)
                end
            end

            stop(camera)
            return nothing
        else
            rethrow(e)
            return nothing
        end
    end
end

The arguments of the function are complex data types which contain a pointer to the camera device and other data containers.

This is how I run the routine on a remote process

using Distributed
addprocs(1)
remote_do(work,2,camera,remcam)

And this is the returned error

ERROR: cannot serialize a running Task
Stacktrace:
  [1] error(s::String)
    @ Base ./error.jl:33
  [2] serialize(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, t::Task)
    @ Serialization /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:445
  [3] serialize_any(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)
    @ Serialization /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:657
  [4] serialize(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)
    @ Serialization /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:636
  [5] serialize_any(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)
    @ Serialization /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:657
  [6] serialize(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)
    @ Serialization /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:636
  [7] serialize_any(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, x::Any)
    @ Serialization /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:657
  [8] serialize
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:636 [inlined]
  [9] serialize(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, t::Tuple{SpinnakerCameras.Camera, SpinnakerCameras.RemoteCamera{UInt8}})
    @ Serialization /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Serialization/src/Serialization.jl:201
 [10] serialize_msg(s::Distributed.ClusterSerializer{Sockets.TCPSocket}, o::Distributed.RemoteDoMsg)
    @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/messages.jl:78
 [11] #invokelatest#2
    @ ./essentials.jl:708 [inlined]
 [12] invokelatest
    @ ./essentials.jl:706 [inlined]
 [13] send_msg_(w::Distributed.Worker, header::Distributed.MsgHeader, msg::Distributed.RemoteDoMsg, now::Bool)
    @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/messages.jl:174
 [14] send_msg
    @ /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/messages.jl:122 [inlined]
 [15] remote_do(::Function, ::Distributed.Worker, ::SpinnakerCameras.Camera, ::Vararg{Any, N} where N; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:461
 [16] remote_do(::Function, ::Distributed.Worker, ::SpinnakerCameras.Camera, ::Vararg{Any, N} where N)
    @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:461
 [17] remote_do(::Function, ::Int64, ::SpinnakerCameras.Camera, ::Vararg{Any, N} where N; kwargs::Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:486
 [18] remote_do(::Function, ::Int64, ::SpinnakerCameras.Camera, ::Vararg{Any, N} where N)
    @ Distributed /buildworker/worker/package_linux64/build/usr/share/julia/stdlib/v1.6/Distributed/src/remotecall.jl:486
 [19] top-level scope
    @ none:1

Can anyone explain what the error means?

Best,

Sitthichat

Can you ssh into the same machine and run it inside a REPL there ?

Sometimes that error message means the remote process crashed

I found out that it has to do with the data types that are passed to the function. The Camera and RemoteCamera types are composite types which are comprised of arrays, pointers, other composite types. When I pass only fields of the composite types; eg. passing only a pointer of the camera device within the Camera type, this error is solved.

1 Like