How can `sleep(0.001)` error?

Hi all :slight_smile:

How can sleep(0.001) error?

┌ Error:
│ EOFError: read end of file
│ Stacktrace:
│  [1] wait
│    @ .\asyncevent.jl:159 [inlined]
│  [2] sleep
│    @ .\asyncevent.jl:265 [inlined]

This happend on Windows. Julia v10.0.0

Thanks all!

Please post a minimal working snippet of code that triggers this error for you. Please read: make it easier to help you

Hard to give a minimal working snippet. Just can tell that it happens almost never (my feeling 1 in 10^15).
The surrounding code is a while-true with a inner try-catch and a pulling ccall. All this runs in a separate task.

It shouldn’t be hard to share the code you’re actually trying to run. You only showed a partial error message.

1 Like

I can do that… But I have to remove company code. But the following function gets spawned in at task.

function get_next_item()
    while _IS_RUNNING[]
        try

            return_buffer = managed_allocating_ccall(
                ccall_get_next_item
            )

            if isnothing(return_buffer)
                sleep(0.001)
                continue
            end

            distribute_to_channels(
                deserialize_item(return_buffer)...
            )
        catch e 
            @error(
                sprint(showerror, e, catch_backtrace())
            )
        end
    end
end

and

function managed_allocating_ccall(allocating_ccall::Function)::Union{IOBuffer, Nothing}
    (return_value_ptr, return_value_length) = allocating_ccall()
        
    if return_value_ptr == Ptr{UInt8}()
        return nothing
    end

    return_value_length_buffer = IOBuffer(return_value_length)
    return_value_length = read(return_value_length_buffer, Int32)
    return_value = unsafe_wrap(Vector{UInt8}, return_value_ptr, return_value_length)

    return_value = deepcopy(return_value)
    ccall_free(return_value_ptr)

    return IOBuffer(return_value)
end

and

function ccall_get_next_item()
    return_value_length = zeros(UInt8, 4)

    return_value_ptr = @my_semaphore_lock begin
        GC.@preserve return_value_length begin
            @ccall $C_ABI_POINTER_GET_NEXT_ITEM(
                pointer(return_value_length)::Ptr{UInt8},
            )::Ptr{UInt8}
        end
    end with Semaphore _CCALL_SEMAPHORE

    (return_value_ptr, return_value_length)
end

I got this error again in production. I have no idea to debug or fix it. Can me help anybody?

Wrap sleep in a try-catch block and ignore EOFError:

try
    sleep(0.001)
catch ex
    ex isa EOFError || rethrow()
end

Thank you for that idea. But why isn’t that in Base or why does it happen at all? I mean I don’t want to wrap the sleep method all the time

This should get a better error message. The error is thrown here:

Which triggers in this case:

So this throws EOFError if the timer has already expired i.e. the sleeping duration is too short.

5 Likes

It seems like for sleep the code should be modified so no error is thrown in that case. It doesn’t seem helpful to error if it “overslept”, in fact that seems expected in some cases.

4 Likes
5 Likes

If the issue was the duration being too short, we would expect the following code to throw an error as the timer t has already expired by the time we wait for it. But it does not.

julia> t = Timer(1); sleep(2); wait(t)

julia>

wait on a Timer is guaranteed not to return until the Timer is expired and the Timer has not yet told anyone that it has expired (or its been told that it has expired through close). In other words, if t has already expired when the first wait call occurs, wait just notes that down atomically and returns immediately. Subsequent calls to wait on the same object throw, since the timer has already notified the program that it has expired:

julia> t = Timer(1); sleep(2);

julia> isopen(t)
false

julia> wait(t) # first time someone waits on this specific Timer

julia> wait(t) # second time someone waits on this specific Timer
ERROR: EOFError: read end of file
Stacktrace:
 [1] wait(t::Timer)
   @ Base ./asyncevent.jl:159
 [2] top-level scope
   @ REPL[16]:1

The error message is maximally unhelpful here, and I agree that this should throw on the first wait too. As I understand it, this behavior is part of the buggy behavior mentioned in the linked PR.

The sleep function is not really intended for a duration of 1ms. If you need to sleep for less than 5 ms, try using sleep_ms()` from the Timers.jl package. Or use Libc.systemsleep() which is still better than sleep() .

1 Like