Benchmarking Fallible Code

AlexanderNenninger · April 24, 2023, 7:47am

Is there a way to indicate failure in benchmarked code using BenchmarkTools? Say we have the following function

using Random

function mayfail()
    if rand() < 0.1
        throw(ErrorException("┻━┻︵ \(°□°)/ ︵ ┻━┻"))
    end
    # Some expensive operations ...
end

suite = BenchmarkGroup()
suite["mayfail"] = @benchmarkable mayfail()
run(suite)

and I have no control over the early exit behavior of mayfail().
As I see it I have a few options:

Abort the whole benchmark suite. This is undesirable since most benchmarks will succeed and I don’t want to waste my time rerunning all benchmarks
Catch the error. This is undesirable since I cannot differentiate anymore which runs were fast and which runs just report high performance due to early failure.
Make mayfail() deterministic. NO, this is not what I’m asking.

Currently I’m using side-effects to exfiltrate the information about the failed benchmarks, but this is somewhat brittle and I need to manually match failed and successful trials. Also I’m not sure what the performance impact is.

Ideally BenchmarkTools.Trial would indicate whether a invocation of mayfail() actually failed. How do I proceed here?

Sukera · May 29, 2023, 10:50am

Goodness gracious, I’ve had this tab open for a month and always put off answering - let me remedy that.

BenchmarkTools doesn’t have a concept of a failed benchmark - any run, no matter what happens, is a valid execution and is thus recorded. If an evaluation is outside of some range (I think it’s 3 standard deviations from the mean?), it’s excluded as an outlier.

My suggestion would be to benchmark the code that’s called in those functions, or construct your inputs in the benchmark in such a way that you don’t hit the error path.

AlexanderNenninger · May 30, 2023, 2:03pm

Thanks, I’ll exfiltrate the results in some custom data structure that mirrors the benchmark suite. We want to publish the benchmarks, so only testing parts isn’t really an option. On the plus side, the Benchmarks are fairely long-running, hence an extra Dict inserts won’t mess up the results siginificantly.

AlexanderNenninger · June 14, 2023, 8:33am

For future reference: I’ve written an extension to BenchmarkTools.jl that allows for recording of return values: GitHub - AlexanderNenninger/BenchmarkTools.jl at record-return-values. This was by far the easiest solution.

After implementing it, I thought of another option though. Given that a non-deterministic function Foo is a random variable, i.e. a mapping \text{Foo}: \Omega \to \mathbb{X}, we can view it as a deterministic function from some probability space \Omega into the co-domain \mathbb{X}.

But what is \Omega in our case?

We can simply choose \Omega to be the set of possible seeds for a random number generator. The procedure to implement would be

Generate a fixed set of seeds S. Ideally there shouldn’t be any repeats in S and the RNG used should produce good random numbers across seeds.
Record Foo(s) for s in S
Benchmark Foo(s) for s in S.
Match benchmark results with already known return values.

Topic		Replies	Views
Should we enable storage of function outputs in BenchmarkTools? Profiling benchmark , benchmarktools	13	794	September 26, 2023
Benchmark Tests: Improvements for BenchmarkTools Performance discussion	16	2201	August 12, 2021
Idiomatic Benchmarking General Usage package , benchmark , ci	6	737	March 2, 2021
BenchmarkTools theory and practice General Usage benchmark , benchmarktools	1	824	September 5, 2017
BenchmarkTools with simple, fast-running function New to Julia	3	2130	February 21, 2019

Benchmarking Fallible Code

Related topics