Types for log-domain values

maxmouchet · November 22, 2019, 4:55pm

Hi,

I was wondering if there are existing solutions to the following problem.

In statistics, it is common to work with likelihoods (positive reals) or log-likelihoods values.
Let’s say that we want to implement an algorithm for both types of values. One implementation takes the likelihood as an input, the other the log-likelihood.

The most basic API that comes to my mind is something like this:

alg(X::AbstractVector{Float64}) = ...
alglog(Xlog::AbstractVector{Float64}) = ...

One problem with this approach is that it’s easy to use the wrong implementation, as nothing prevents the user to call alg(Xlog). I thought it would be nice if this could be enforced by the type system, for example:

struct LogValue{T}
    val::T
    base::Int
end

alg(X::AbstractVector{Float64}) = ...
alg(X::AbstractVector{LogValue{Float64}}) = ...

It’s now the responsibility of the function that produces the log-values to wrap them in the correct type. This also clean-ups the API by making use of multiple dispatch.

Has anyone experimented with something like this ? Do you have any other approaches ?

Thanks,
Maxime

jling · November 22, 2019, 7:56pm

I think this is a bit over-engineering, like, when you use cos in any library, the first thing you do is check if it’s in degree or radian and it’s just a normal thing to do/know if you work in the domain.

rdeits · November 22, 2019, 8:30pm

I think this sounds like a good idea, and one which will be work uniquely well in Julia. I don’t work in statistics, so I can’t comment directly on your particular domain, but I would suggest taking inspiration from the way Colors.jl works:

Most image processing toolkits treat an image as a collection of ints (perhaps in the range 0-255, perhaps in some other range), or a collection of floats (perhaps in the range 0-1), which leads to inevitable errors when users mix up their representations or ranges. Colors.jl, on the other hand, does two clever things with the type system:

It creates Gray, RGB, HSV, etc. wrapper types which clearly separate pixel data from other kinds of data
It uses FixedPointNumbers to ensure that a Gray pixel is always conceptually in the range 0-1, even if it uses an int bewteen 0 and 255 under the hood.

That means that all pixel and image types can have a uniform interface and range, even though their underlying storage might be very different. I’ve found this really pleasant to work with, and there is little to no run-time cost for all of this niceness in Julia.

It seems like you could do something quite similar, creating a LogLikelihood struct which wraps the value and base, and then implement various log-likelihood operations on that type.

rdeits · November 22, 2019, 8:33pm

My experience is that what actually happens is I get it wrong all over the place, and I suspect I’m not alone. I think using separate types for different units is a great idea. But perhaps my experience isn’t universal.

oxinabox · November 22, 2019, 10:23pm

I would lock the base as e
becauuse almost everything anyone work on is log base e.

Then you can do things that massively

struct LogNum{T<:Number} <: Number
    logval::T
end

convert(::Type{<:LogNum{T}, ::T) where {T} = LogNum(log(x))

value(x::LogNum) = exp(x.logval) 
convert(::Type{T}, x::LogNum{T}) = value(x)
# bunch nore convert and promote methods

Base.:*(x::LogNum, y::LogNum) = LogNum(x.logval + y.logval)
Base.:^(x::LogNum, y::Real) = LogNum(y * x.logval)

It would be great. Now you can just take products of probabilities and they don’t underflow for ages.

If we had the ability to overload numbers like this earlier,
probability math would be much nicer.

StevenSiew · November 22, 2019, 10:44pm

If you are prepared to do more work/coding then you can have a better precision for your REAL number by separating the Integer part (of your logarithm) from the fractional part.

    # Integer Plus Fraction is used to perform calculations
    # with base 10 logarithmns
    struct IpfType
        int::BigInt
        frac::Float64    # 0.0 <= frac < 1.0
    end

    function +(a::IpfType,b::IpfType)::IpfType
        c = a.frac + b.frac
        if c < 1 
            return IpfType(a.int + b.int,c)
        end
        if floor(c) == 1.0
            result = a.int + b.int + 1
            c = c - 1.0
            return IpfType(result,c)
        end
    end

I learn this trick by physically using log tables in high school.
http://www.sliderules.info/a-to-z/log-tables.htm

dpsanders · November 22, 2019, 11:08pm

Doesn’t the fly in the ointment come when adding?

StevenSiew · November 22, 2019, 11:13pm

DPSANDERS: Doesn’t the fly in the ointment come when adding?

Convert back to normal numbers, add them then convert back to IpfType

But really, if you need to add, a sliderule is the wrong type of tool to use.

oxinabox · November 22, 2019, 11:15pm

Yep, can’t have all the things.

But for probability additions not that great anyway.

StevenSiew · November 22, 2019, 11:21pm

I found that you can add in logarithm

oxinabox · November 23, 2019, 12:23am

Sorting the integery part as a BigInt isn’t great they are multiple orders of magnitude slower than an Int128.
Probably just go type-parametric, or Int, or Int128

For fraction part it is a bit of a waste all that representational capacity above 1 and below zero.
Can instread store fractional part as a UInt, as the represtation of the value frac/typemax(UInt)

baggepinnen · November 23, 2019, 12:33am

Yep, I also get it wrong all the times. A vector of eigenvalues can represent many thing, poles of rational functions, zeros, poles and zeros in continuous or discrete time domain. They all look the same, but you need to dispatch to different methods depending on domain. Better wrap them in types to avoid repeatedly shooting yourself in the foot with the same foot gun over and over.

Tamas_Papp · November 24, 2019, 6:59am

This is possible to do in Julia with very little overhead: eg see this done for units in

That said, the approach has some cost to the user in terms of taking extra care with the implementation (lifting etc) and implementing the common algebraic operations. Whether that is worth it to you depends on your tolerance for bugs and other, complementary approaches to ensure correct code (eg testing, reviews).

Personally, I would just use log probabilities all the time, and rely on unit testing (which one would do anyway) to ensure correctness.

Topic		Replies	Views
Saving Probabilities in Log Domain (Constructor Issues) General Usage type , parametric-types	6	466	November 13, 2020
[ANN] LogarithmicNumbers.jl v1.0.0 Package Announcements numbers	3	756	March 23, 2022
Type constraints in Distributions and StatsFuns Tooling	11	1239	April 8, 2019
Dispatch on physical quantity or creating distinct types that behave like a float General Usage physics , dispatch	7	582	September 25, 2023
How can I extend a primitive type General Usage question	29	583	March 16, 2023

Types for log-domain values

Related topics