How would one implement a Supposition.jl test for something like the following MWE?

@check function phased(x = Data.Floats())
cosd(x) == sind(x + 90)
end

which fails due to floating point stuff

x = 1.8014398509481896e16
(cosd(x), sind(x + 90))

which returns two different numbers.

Just like how @test_broken is an admission that a functionality is currently not behaving as desired,
I’d imagine maybe splitting into two tests?

#1. An expected-passing @check test on a range of Supposition.Data.Floats (using Supposition.Data.Satisfying?) - different range for each concrete floating point numeric type. #2. An admission of broken functionality, somewhat a @check_broken on the floating point range complement of #1.

It’d be interesting to run such tests on Julia Base, and packages like SpecialFunctions.jl which can build awareness for people to know the specific limitations of the numeric types they’re implementing. And for their use cases, they might consider doing things like what Milan did for SpeedyWeather.jl (in his case, mapping the important dynamics to a custom numeric type containing a finer collection of values at the important ranges for his modelling).

I forgot to say that it actually fails isapprox as well.

@check function phased(x = Data.Floats())
isapprox(cosd(x), sind(x + 90))
end

Found counterexample
Context: phased
Arguments:
x::Float64 = 1.8014398509481896e16

My original post before editing had isapprox in it because I tested with == first, then with isapprox, both of which failed, then I copy-pasted to here.

Though from the context of, I’m a package developer, I have implemented some math functions, and I’d like to verify them, so I’ll use Supposition.jl as the closest thing to formal verification that we have, and implement them as part of my tests. In that context, I’m wondering how to go about it.

And in the documentation, the error plots can be provided, with reference to the Supposition.jl testing results.

I would think this important for say, e.g. inclusion as uncertainty bounds in implemented numerical models.

This is an interesting question, thank you for bringing it up! I think the most relevant section of the docs here is Alignment of Documentation · Supposition.jl Documentation, since it talks about these kinds of issues a bit in the context of floating point values in particular. Documenting these kinds of behaviors is IMO the best course of action.

Unlike with the precision of sincosd, I don’t think there’s any guarantee about the precision of the basic sincosd relative to cosd and how it should behave when shifting the phase of the input by 90°. So whether this inexactness is considered “broken” or not is really not that easy to answer! What I would do is basically the same as what @jar1 suggested - use isapprox with an absolute tolerance and check that the output is always within that tolerance. For a single operation like sind on a Float64, this is typically 1 ULP, and the result is compared to sind on a BigFloat. For your example though, you’d still have to compare the Float64 results of sind & cosd with isapprox.

Note also that it’s going to be extremely difficult to guarantee a large number of trigonometric identities to hold, even within some chosen precision. See e.g. this floating point favorite of mine for an example where it’s hard.

All that being said - your intuition of using error plots as a guiding principle is spot on! In this example, I’d try to plot the error over the range you’re interested in and just take a look at how bad it really is. Since this is about trigonometry, I’d forego the *d versions and use sin/cos directly, producing a plot of all Float16 numbers in [0, 1].

Maybe the overall error is already acceptable, maybe the worst case is MUCH worse than the example Supposition.jl found. Remember, it tries to find the minimal input that produces the same test result, which may not necessarily be the minimal error that you’re actually interested in! You could try to find that with target!(abs(sind(x+90) - cosd(x))) (see this doc section), which looks like it could be simple enough to work out.

Speaking hypothetically here, suppose there exists a package, let’s call it ComputationalErrors.jl (or maybe even just include the following in Supposition.jl). It has a macro, say @uncertainty. Analogous to @btime which outputs a measure of performance, @uncertainty would output a measure of uncertainty, and would be used with similar syntax as @check. It could then be used in package authors’ documentations, being calculated autonomously on each doc build. I feel like that would be in the spirit of what Supposition.jl is pointing toward. What is everyone’s thoughts on this idea?

It could be used in Julia’s documentation for math functions, on SpecialFunctions.jl, on automating the difference of accuracy calculations for different architectures/parallelisations/etc. It can inform on what uncertainty bounds to use as inputs when using a package’s functions in one’s own models. Is that something that people would find useful? I would personally find this more acceptable than manually trying to find an isapprox tolerance for things that should theoretically ideally be isequal.

Edit: Stefan’s post on that summation behaviour is crazy!

It’s not a bad idea, but I don’t think it’s a good fit for Supposition.jl. You can probably achieve this exact thing by using IntervalArithmetic.jl and checking that the true upper & lower bounds returned are within your margin of error, without having to add anything to Supposition.jl.

I think in this case, you might want to either restrict the domain or check for relative precision with respect to x.

@check function phased(x = Data.Floats())
if !isfinite(x) || !isfinite(x + 90)
return true
end
isapprox(cosd(x), sind(x + 90); atol=eps(x)/2)
end

Here, eps(x)/2 represents the rounding error. For functions with larger variations, you might want to use Taylor expansions to approximate the error. For example:

# First order approximation
atol = C * abs(df(x) * eps(x)/2)
# Second order approximation (useful if f' can be zero)
atol = C * abs(df(x) * eps(x)/2 + ddf(x) * (big(eps(x))/2)^2 / 2)
# You can extend this further for higher orders

Where C is a constant that you choose to adjust the tolerance level. In the second-order case, I use BigFloat because eps(x)^2 can result in infinity (Inf) in floating-point arithmetic for large values of x.

Oh it didn’t cross my mind to use a Taylor expansion in defining a tolerance, that’s a good idea. Would work well with autodiff utilities as well.

And I guess that would make sense if I wanted a more flexible tolerance system for my property tests. But I’m thinking of the scenario: I pick up Julia for the first time, I choose a package which implements some mathematics, and I would use said package with the expectation that the inplementations would hold to mathematical properties. So a package author could get in the mindset of giving their users the awareness: “this is what my package provides, and these are its specific limitations based on how it holds/doesn’t hold mathematical relationships.” For this perspective, I’m not looking to pass tests. I’m looking to measure deviation from objective truth.

Regarding using IntervalArithmetic, that would produce a range of outputs based on a range of inputs. But my thinking is of evaluating an incorrectness of expected properties based on single inputs. So we’re back to the error graph.

(On that note, sorry @Sukera for the premature “Solution” labelling and for un-solutioning your response, it took me a moment to process my thoughts.)

I was going to suggest implementing something like that by hand (I wasn’t aware of this package), but I was unsure how to handle non-monotonic functions.

@mleseach are you talking about the Taylor expansion method for deciding a tolerance? I would guess that the Taylor expansion would be centered at the value the test is running on, so it should work for non-monotonic functions.

What I would understand it wouldn’t work on is functions that are not C1, i.e. continuous and differentiable.

Edit: I just noticed you were replying to Sukera who was talking about using IntervalArithmetic, nevermind.

I think in this case the problem lies not with Supposition.jl but rather floating point arithmetic tbh. The property you test, i.e. cosd(x) == sind(x + 90) indeed does not hold for all floats. However, this is due to finite precision, because for very large x you have x+90 == x (which Supposition.jl correctly identifies!). So I think you should rather try to fix your test to be more meaningful.
I suggest splitting it into two parts:

You test that cosd(x) == sind(x + 90) for some reasonable range, like -720 < x < 720

You test that it correctly wraps the input, i.e. cosd(x) = cosd(x % 360)
Together these tests essentially cover the same the ground as before but you avoid the trouble with floating point additions.

Perhaps you could combine these two cases like so:

@check function phased(x = Data.Floats())
x = x % 720 # or some other reduction to a reasonable interval
cosd(x) == sind(x + 90)
end

But I think leaving the cases separate makes is clearer and actually tests a bit more thorough.

Specifically the x value in the original post is an integer valued Float64 which is large enough that x + 90 cannot be represented exactly as a Float64.

I like the point and demonstration of how my test can be improved. Splitting into two parts is actually a good idea, thank you!

I think the point I’m trying to make in this thread is the fact that Julia’s cosd doesn’t mod the input, and therefore, what’s the appropriate caveat of usage informing users on the potential errors? How large can x be before we expect inequality? And beyond that, how large can x be before, say, acosd(cosd(x)) starts behaving in quantifiably undesirable ways?

That said, I am still starting to reach the same conclusion that Supposition.jl isn’t scoped for the presentation I am looking for. It is, however, very good for bringing awareness of the corner cases/corner domain regions, and for verifying that the presentation is correct.

Splitting into cases will be applied to, as I’ve listed above, the domain of inputs where a property exactly holds, the domain of inputs where a property approximately holds and present the error (plot or otherwise), and the domain of inputs where you get mangled garbage returned.

So applying your observation to the point I’m trying to convey and ask about, in order for a user to get accurate output, they would need to:

Know about floating point results and arithmetic specific to the function(s) they’re using.

Implement appropriate workarounds to achieve the accuracy they want. In the example of cosd as you demonstrated: the internal wrapping with Int to their input to the trig function.

The cosd example isn’t the best example because I’d imagine pretty much no one works with inputs with magnitudes that large.

But there are other cases where such gotchas have been asked and discussed.

We give the generalised caveat on floating point arithmetic, but it’d be nice to have a library of demonstrations and numerical quantifications on the errors of floating point arithmetic in the context of the many ubiquitous math functions that get called throughout the ecosystem.

Oh so this issue is attributed more to the floating point sum in the test as pointed out by others above, rather than the way cosd parses the input. Okay cool, thank you.

Thanks heaps! I’ve actually been following that very interesting discussion and thinking about its relevance to what I’m thinking about in this thread. Thank you for the cross reference