Is there way to make testsets fail after N seconds?

Sometimes a @testset ends up with an infinite loop that makes testing a pain.

Is there a way to add a timeout param to testsets to treat all codes taking longer than N seconds as failures?

3 Likes

This isn’t specific to test sets, but I think you can limit execution time by having two async tasks write to a channel. One tasks executes your function of interest or test, and the other just sleeps for the time limit then writes to the channel:

function timelimitcall(f, x, t)
	c = Channel{String}(1)
	@async put!(c, string(f(x)))
	@async begin
		sleep(t)
		put!(c, "out of time")
	end
	return take!(c)
end

function slowf(x::Int)
           sleep(2)
           return "your int was $x"
       end
WARNING: Method definition slowf(Int64) in module Main at REPL[11]:2 overwritten at REPL[27]:2.
slowf (generic function with 1 method)

julia> @time timelimitcall(slowf, 3, 1)
  1.011541 seconds (28 allocations: 5.625 KB)
"out of time"

julia> @time timelimitcall(slowf, 3, 4)
  2.010573 seconds (35 allocations: 8.766 KB)
"your int was 3"

This doesn’t seem to work in cases where the potentially slow or looping function uses all available threads, which makes me think it’s not the right way to do this. Hopefully there’s something better!

2 Likes

Here’s a shot at the case where the trial function doesn’t yield, but a signal works:

module M

const SIGINT=2
function limiter(c)
    ppid = take!(c)
    println("limiter ready, parent is $ppid")
    t = take!(c)
    it = 0
    while it < t && !isready(c)
        it += 1
        sleep(1)
    end
    if !isready(c)
        println("time's up!")
        run(`kill -$SIGINT $ppid`)
        println("limiter done (kill)")
    else
        t = take!(c)
        println("limiter done, got ",t)
    end
end

function trial(n,t::Int)
    a = rand(n,n)
    pid = getpid()
    workers = addprocs(1)
    @everywhere if !isdefined(:M)
        include("tlimit1.jl")
    end
    # channel must be owned by child
    c = RemoteChannel(()->Channel{Int}(0),workers[1])
    F = nothing
    @sync begin
        @async remote(limiter)(c)
        put!(c,Int(pid))
        put!(c,t)
        try
            F = schurfact(a)
            put!(c,0)
            println("F is ready")
        catch(JE)
            println("caught ",JE)
        end
    end
    # clean up, assuming sync is done correctly
    rmprocs(workers)
    F
end

end
julia> M.trial(1024,200)
	From worker 40:	limiter ready, parent is 7752
F is ready
	From worker 40:	limiter done, got 0
Base.LinAlg.Schur{Float64,Array{Float64,2}} with factors T and Z:
[511.572 -0.087236 … 0.160368 -0.150227; -0.0 8.63058 … 0.135984 -0.437195; … 
...
julia> M.trial(1024,1)
	From worker 41:	limiter ready, parent is 7752
caught InterruptException()
	From worker 41:	time's up!
	From worker 41:	limiter done (kill)

Proof of principle only, needs elaboration for package use. Some lockups seem to need more than a SIGINT; perhaps one of the developers can recommend something stronger?