I have a function with a suggested PR to throw an error if the result would return a NaN. I am very much appreciative of someone opening a PR but I was looking for advice on the approach.
I am unsure if it is better package design to through an error, with some helpful pointers, or to let NaNs and Infs result?
The function in question is essentially:
function ratio(v::T) where {T<:AbstractArray}
positive = sum(x for x in v if x > 0)
negative = -sum(x for x in v if x < 0)
return positive / negative
end
In the cases where there are not values with opposite signs or all zeros this function will return a variety of NaN/Inf.
The applied context here is in the ActuaryUtilities.jl package and is part of a broader set of risk metrics.
I don’t know that such a question has a singular generalizable answer; I think it depends on context.
There are some occasions where throwing an error on NaN values would prematurely terminate a calculation which yields a non-NaN result, similar to how imaginary numbers are sometimes needed for intermediate steps in calculations that begin and end in reals.
However, in your context I suspect NaN will simply confuse the users with no benefit, so it likely offers more information and has zero usability cost to throw an error. Indeed, NaN itself I believe has a confusing name, because it arises from calculations which do result in numbers but just need L’Hôpital’s rule to solve properly (i.e. they’re numbers whose values are unknown to the program), so NaN is itself a misnomer.
However, I am not a subject matter expert on what types of calculations will be involved in your package.
As mentioned above, the context matters here - is there a meaning to a ratio that only has positive entries? If you can’t make that decision in the library, to me at least, this indicates an edge case that users should have to actively think about when using your library, lest they get wrong results.
As someone doing something vaguely adjacent at the minute I’d say errors are good - I’m reusing an old project at the minute which I wrote years ago to value loan books, and have been stumpted for a couple of days by random NaN poisoning my results when using the code with new data. Some of this I’ve worked out (e.g. the data erroneously had some zero interest rates which the code didn’t handle), but it definitely would have been easier if I’d handled this more explicitly.
So yes in the example you give in your OP I’d vote for handling this explicitly. I’m not an actuary so there might be something I’m missing but to me calculating moic when there are no contributions and it’s likely that the user is either calling this function in some larger pipeline where they didn’t expect an individual corner case (similar to my zero-interest-rate example above) or are misunderstanding the function.
I would raise an error in this function only if the sum of negative numbers being zero would be considered an error. Otherwise I would return a NaN if either positives and/or negatives are zero, or perhaps better, return both positive and negative and let the caller the responsibility on what to do with them.
Thanks for the feedback, all. What about the in-between solution of logging an @warn? I’m kind of leaning towards not erroring because an input that may be a valid vector for other functions would error in this one and potentially needlessly halt the program.
warnings in backend packages are generally not great.
Because often one either cares, or does not care.
If one cares one needs to fix the warning, and it would be easier to fix if it is an error.
and if one doesn’t care then the warning is an annoyance.
Often flooding the screen if you call it in a loop, or triggering an alarm if the code is running on a monitored system.
So either way one often wants to make sure the warning doesn’t happen, so its better if it is an error.
This isn’t always the case, but it is often the case.
If you know a lot about your end user you can better make the call.
But for a backend mathematicalk library you often don’t.
It’s rather annoying to silence warnings case by case in julia.
(Though I think we put code for how to do that, somewhere in the JuliaLogging website. If not we should)
Because, obviously. The OP also specifies that Inf is one of the valid returns. Hiding the clear information in 0 or Inf behind NaN would be very odd.
If all of 0, InfandNaN are invalid results, then it seems pretty clear that it should be an error, not lumping them all into a NaN, which would give a misleading impression of what happened.
I’ve looked for general guidelines on this topic in the past and never found any.
There are several values that could be used to signal an invalid function calculation:
NaN
Inf
Missing
nothing
A key question is: should the calculations halt? Or can the program recover from an invalid value?
Does Inf vs NaN provide any information on how to recover? If it doesn’t, you might consolidate to NaN to signal a problem – so outside of the function the user only needs to test the return value with isnan().
These all have well accepted meanings. NaN and Inf are IEEE 754 floating point values.
julia> NaN isa Float64
true
julia> Inf isa Float64
true
julia> 0/0
NaN
julia> 1/0
Inf
missing represents missing data. nothing represents the absence of a value. missing is used for missing data values in a data analysis context; nothing is used in a general programming context, like when a function does not return anything, or when findfirst fails to find anything.
On the overall question, the more I think about it, the more it seems right to return 0, Inf, NaN, and let the caller sort it out. Each of those are perfectly valid results of a division, and might be of interest to the caller. There is nothing intrinsically wrong about those results, seen from inside a function that simply calculates ratios, and there may or may not be, seen from the outside.
Thank you all for the thoughtful feedback. I think I am leaning towards letting floating point math do its thing here and adding color to the docstring to clarify why this may occur. My main motivation is to avoid halting the program for throwing an error, and if I’d ask the user to handle an error… well then they may as well manage the input or decide not to call the function instead of catching the error.