I was wondering if there are existing solutions to the following problem.

In statistics, it is common to work with likelihoods (positive reals) or log-likelihoods values.
Let’s say that we want to implement an algorithm for both types of values. One implementation takes the likelihood as an input, the other the log-likelihood.

The most basic API that comes to my mind is something like this:

One problem with this approach is that it’s easy to use the wrong implementation, as nothing prevents the user to call alg(Xlog). I thought it would be nice if this could be enforced by the type system, for example:

It’s now the responsibility of the function that produces the log-values to wrap them in the correct type. This also clean-ups the API by making use of multiple dispatch.

Has anyone experimented with something like this ? Do you have any other approaches ?

I think this is a bit over-engineering, like, when you use cos in any library, the first thing you do is check if it’s in degree or radian and it’s just a normal thing to do/know if you work in the domain.

I think this sounds like a good idea, and one which will be work uniquely well in Julia. I don’t work in statistics, so I can’t comment directly on your particular domain, but I would suggest taking inspiration from the way Colors.jl works:

Most image processing toolkits treat an image as a collection of ints (perhaps in the range 0-255, perhaps in some other range), or a collection of floats (perhaps in the range 0-1), which leads to inevitable errors when users mix up their representations or ranges. Colors.jl, on the other hand, does two clever things with the type system:

It creates Gray, RGB, HSV, etc. wrapper types which clearly separate pixel data from other kinds of data

It uses FixedPointNumbers to ensure that a Gray pixel is always conceptually in the range 0-1, even if it uses an int bewteen 0 and 255 under the hood.

That means that all pixel and image types can have a uniform interface and range, even though their underlying storage might be very different. I’ve found this really pleasant to work with, and there is little to no run-time cost for all of this niceness in Julia.

It seems like you could do something quite similar, creating a LogLikelihood struct which wraps the value and base, and then implement various log-likelihood operations on that type.

My experience is that what actually happens is I get it wrong all over the place, and I suspect I’m not alone. I think using separate types for different units is a great idea. But perhaps my experience isn’t universal.

If you are prepared to do more work/coding then you can have a better precision for your REAL number by separating the Integer part (of your logarithm) from the fractional part.

# Integer Plus Fraction is used to perform calculations
# with base 10 logarithmns
struct IpfType
int::BigInt
frac::Float64 # 0.0 <= frac < 1.0
end
function +(a::IpfType,b::IpfType)::IpfType
c = a.frac + b.frac
if c < 1
return IpfType(a.int + b.int,c)
end
if floor(c) == 1.0
result = a.int + b.int + 1
c = c - 1.0
return IpfType(result,c)
end
end

Sorting the integery part as a BigInt isn’t great they are multiple orders of magnitude slower than an Int128.
Probably just go type-parametric, or Int, or Int128

For fraction part it is a bit of a waste all that representational capacity above 1 and below zero.
Can instread store fractional part as a UInt, as the represtation of the value frac/typemax(UInt)

Yep, I also get it wrong all the times. A vector of eigenvalues can represent many thing, poles of rational functions, zeros, poles and zeros in continuous or discrete time domain. They all look the same, but you need to dispatch to different methods depending on domain. Better wrap them in types to avoid repeatedly shooting yourself in the foot with the same foot gun over and over.

This is possible to do in Julia with very little overhead: eg see this done for units in

That said, the approach has some cost to the user in terms of taking extra care with the implementation (lifting etc) and implementing the common algebraic operations. Whether that is worth it to you depends on your tolerance for bugs and other, complementary approaches to ensure correct code (eg testing, reviews).

Personally, I would just use log probabilities all the time, and rely on unit testing (which one would do anyway) to ensure correctness.