Benchmarking Fallible Code

Is there a way to indicate failure in benchmarked code using BenchmarkTools? Say we have the following function

using Random

function mayfail()
    if rand() < 0.1
        throw(ErrorException("┻━┻︵ \(°□°)/ ︵ ┻━┻"))
    end
    # Some expensive operations ...
end

suite = BenchmarkGroup()
suite["mayfail"] = @benchmarkable mayfail()
run(suite)

and I have no control over the early exit behavior of mayfail().
As I see it I have a few options:

  • Abort the whole benchmark suite. This is undesirable since most benchmarks will succeed and I don’t want to waste my time rerunning all benchmarks
  • Catch the error. This is undesirable since I cannot differentiate anymore which runs were fast and which runs just report high performance due to early failure.
  • Make mayfail() deterministic. NO, this is not what I’m asking.

Currently I’m using side-effects to exfiltrate the information about the failed benchmarks, but this is somewhat brittle and I need to manually match failed and successful trials. Also I’m not sure what the performance impact is.

Ideally BenchmarkTools.Trial would indicate whether a invocation of mayfail() actually failed. How do I proceed here?

Goodness gracious, I’ve had this tab open for a month and always put off answering - let me remedy that.

BenchmarkTools doesn’t have a concept of a failed benchmark - any run, no matter what happens, is a valid execution and is thus recorded. If an evaluation is outside of some range (I think it’s 3 standard deviations from the mean?), it’s excluded as an outlier.

My suggestion would be to benchmark the code that’s called in those functions, or construct your inputs in the benchmark in such a way that you don’t hit the error path.

1 Like

Thanks, I’ll exfiltrate the results in some custom data structure that mirrors the benchmark suite. We want to publish the benchmarks, so only testing parts isn’t really an option. On the plus side, the Benchmarks are fairely long-running, hence an extra Dict inserts won’t mess up the results siginificantly.

For future reference: I’ve written an extension to BenchmarkTools.jl that allows for recording of return values: GitHub - AlexanderNenninger/BenchmarkTools.jl at record-return-values. This was by far the easiest solution.

After implementing it, I thought of another option though. Given that a non-deterministic function Foo is a random variable, i.e. a mapping \text{Foo}: \Omega \to \mathbb{X}, we can view it as a deterministic function from some probability space \Omega into the co-domain \mathbb{X}.

But what is \Omega in our case?

We can simply choose \Omega to be the set of possible seeds for a random number generator. The procedure to implement would be

  1. Generate a fixed set of seeds S. Ideally there shouldn’t be any repeats in S and the RNG used should produce good random numbers across seeds.
  2. Record Foo(s) for s in S
  3. Benchmark Foo(s) for s in S.
  4. Match benchmark results with already known return values.