Break function on time limit

You can call setparameters! on the solver object before you give it to JuMP.

That makes sense and works! Thanks

Don’t know whether this might still be reasonable in general for all function calls.

1 Like

I am stuck on the same problem without a clear general solution.

Basically, given a function f(x), how to wrap it in a function g(x) that waits say 1min, and if f(x) is still running, stops the execution and returns missing for example.

4 Likes
_done(limit) = time_ns() > limit
time_from_now(seconds) = round(Int, 10^9 * seconds + time_ns())
function f(x; limit)
    while true
        @show x
        _done(limit) && return missing
        sleep(1.0)
        x += 1
    end
    x
end
g(x; limit = time_from_now(10)) = f(x; limit = limit)
g(0) # will stop in 10s, return missing
1 Like

I still don’t get it. How the code you shared solves the issue? If the function is not called asynchronously, it locks the execution, and so we can’t be watching in a while loop, right?

Can you please show an example where you ask for an execution of say 10min duration (without knowing it will take 10min), but then you kill the execution in 10s?

1 Like

Getting timeout (or cancellation in general) right is the main theme in Structured Concurrency https://github.com/JuliaLang/julia/issues/33248. But we don’t have an out-of-the-box solution to this right now. You have to pass around some kind of “cancellation token” for this to work. But it’d mean it’s more or less equivalent to Tamas_Papp’s code:

function f(x, should_stop)
    while true
        sleep(0)  # need some I/O
        should_stop[] && return missing
        x += 1
    end
    x
end

@sync begin
    should_stop = Threads.Atomic{Bool}(false) # token
    t = @async f(0, should_stop)
    sleep(0.5)
    should_stop[] = true
    fetch(t)
end
1 Like

I could be wrong but I think what @juliohm is asking for is to be able to stop a black box function that won’t necessarily ever yield. So to be able to take any function f, whose internals we don’t have access to and wrap it in a function runtime_limiter like so:

result = runtime_limiter(f, timeout=60)

So in that example, if f is still running after 60 time units, then runtime_limiter returns missing.

In this case could you do something like

time_from_now(seconds) = round(Int, 10^9 * seconds + time_ns())

function runtime_limiter(f::Function, args...; kwargs..., timeout=60)
    t = @async f(args; kwargs)
    end_time = time_from_now(timeout)
    result = missing
    while time_ns() <= end_time
        pause(0.1)
        if isready(t)
            result = fetch(t)
            break
        end
    end
    return result
end

Note that I haven’t tested the above. Don’t know if it would actually work.

Edit: In the case that the function did not complete before timeout, I don’t know how you would kill the running task.

2 Likes

That’s why I said there is no out-of-the-box solution.

1 Like

Ok. I think I misunderstood your reply @tkf. So, to check that I understand you correctly, you’re saying that as far as you know there is no way (or at least no straightforward way) to kill a running task from within the same julia process as the task?

I hadn’t thought of the ability to kill a task as being related to structured concurrency. But upon reflection I see that it is. Again, to check my understanding, the problem is that in unstructured concurrency, all the tasks live in some kind of global scope with no way to trace how they relate to each other. So if you create a task that creates other secondary tasks, then when you want to kill the primary task, there is no automatic way to distinguish between secondary tasks that need to be cleaned up and other tasks that are completely unrelated.

Of course if you’re writing everything yourself you could have a channel that holds all the tasks for a given computation and pushes into it when new subtasks are created but that doesn’t work as soon as you have a black box function that creates its own new tasks.

Thanks for the reply and for helping to improve my understanding @tkf

@juliohm, if you’re willing to fire up a new julia process to run your long running function (and move the input data over to that new process) then there’s a nice solution over at StackOverflow:

https://stackoverflow.com/questions/52018024/how-to-kill-a-task-in-julia

The idea is to start your computation asynchronously on a remote worker and create an empty RemoteChannel for the result. Then back on your master process you have a loop that calls isready on the RemoteChannel until the timeout is reached. If the timeout is reached and the RemoteChannel is still empty you just use rmprocs to kill the remote worker.

2 Likes

Yeah, that’s what I wanted to mention.

It’s a requirement for implementing structured concurrency.

I think you are describing so-called black box principle (i.e., after a function returns or throws, all tasks it spawned should not be running in background). Task cancellation is another building block of structured concurrency as you’d need a way to terminate other tasks to enforce black box principle and rapidly terminating a function call.

Yeah, I agree that putting it in a process is a nice way to robustly cancel the computation (if you are OK with the overhead of a remote call).

Thanks very much @tkf! That exchange of posts clarified a lot for me.

1 Like

I have the same problem. Given a call that I am not interested in the return (only its terminal output is of my interest), I want to make sure it will run unimpeded by X seconds (i.e., without considerable overhead), and then, if it is not finished at the end of this time, it is “stopped by force” and the program flow continues normally by executing the next line after this call. My only other constraint is that it does not compile again methods already called before (I am not sure if solutions that spawn a new process guarantee that or not, I am very ignorant in parallel/distributed computing, specially with Julia).

I looked at JuliaObserver but failed to find anything that seems to solve my problem. Then, I looked at Github and found three repositories, one of them is ancient, and the other two are both called Timeout.jl, one last updated in 2018 and written by @ararslan and another last updated five months written by @goropikari.

This most recent package seems to just create a task with a sleep(time_limit) and another task with the method you want to call, if the first task returns first it kills the second, if the second returns first it kills the timer. Does this works for my purposes? The inner calls will make use of previously compiled methods (i.e., methods already called with the same parameter types), or does this run in an entire other process and re-compiles everything it needs there?

1 Like

That is exactly what I need actually @Pbellive, the original idea was to use distributed processes as opposed to tasks or threads given that my function is quite expensive.

However, I would like to ask how we could adapt that SO answer to the following scenario. I have a pmap call:

results = pmap(xs, on_error=e->missing) do x
  f(x)
end

It already takes care of the problematic iterations that throw errors. Now I want to add the time limit feature as we’ve been discussing. Is it possible to do it together with pmap or I need to reimplement the parallel map functionality by hand by calling remotecall on the pool?

I will give it a try, but if anyone has experience in this topic, please feel free to go ahead and share some code snippet with the pmap + time limit functionality. I need this to speed up an experiment for a paper.

I’d try something like

results = pmap(xs, on_error=e->missing) do x
    push!(chan, (:started, myid()))
    try
        f(x)
    finally
        push!(chan, (:finished, myid()))
    end
end

where chan is a RemoteChannel. You can then do something like

function killloop(timeout, chan)
    timers = Dict{Symbol,Timer}()
    for (event, id) in chan
        if event === :started
            timers[id] = Timer(timeout) do _
                try
                    rmprocs(id)
                finally
                    pop!(timers, id, nothing)
                end
            end
        elseif event === :finished
            t = pop!(timers, id, nothing)
            if t !== nothing
                close(t)
            end
        end
    end
end

to stop the process (run this in @async). Though I’m not sure how this interacts with pmap.

I haven’t updated it because I didn’t know people actually used it. :sweat_smile: Tbh I haven’t used it more than once or twice for anything substantial. It should be compatible with at least 0.6 and 1.0, but doesn’t have appropriate compatibility bounds set in Project.toml.

It takes the approach of running the function call in a remote process and forcibly killing the process if the process doesn’t yield after interrupts. That’s sort of like using a sledge hammer as a fly swatter, but I couldn’t think of anything better at the time, since an interrupted Task may not yield.

PRs of any kind for that package are of course welcome!

2 Likes

I will probably make use of it. If I need to some change of behavior (probably will be needed as the Julia code I will be interrupting have calls to C) I will open a PR. I think it is a useful package for experiment scripts in experimental computer science (where you want the script to be written in Julia to benefit from JIT warming and better abstractions, but you may be calling third party code).

Here’s what I did to solve a similar problem. Probably not helpful for breaking out of a JuMP solver, but some of the discussion seems to reflect my use case of breaking out of a more custom process with a loop, if it’s still going after time_limit:

    TIME_LIMIT = Minute(2)
    start = Dates.now()
    time_elapsed = Minute(0)

    # Improve solution with local-moves until local optimum reached, or time-limit hit
    while !isnothing(local_move) && (time_elapsed < TIME_LIMIT)
        local_move = two_opt(tour, dist_matrix)
        if isnothing(local_move)
            println("Hit local optimum")
        else
            tour, dist_improved = local_move
            println("Improved tour distance by $dist_improved")
        end
        time_elapsed = Dates.now() - start
        if time_elapsed > Minute(TIME_LIMIT)
            println("Stopping local moves due to time elapsed > $TIME_LIMIT")
        end
    end

I ended up using a similar mechanism. Trying to use Distributed only brought headaches. Nobody answered my question about killing workers in a guaranteed way. In the end was far easier to create a new Exception type, propagate a deadline through a good part of code, and in key points call a function that checks the deadline and throws if it was violated. I solved the problem with the solver time limit by changing the solver time limit to exactly the remaining time before deadline when I call solve, and then checking for the deadline right after.

1 Like