# Sum of logs

In some statistical applications (and probably many others) we sometimes have to evaluate `sum(log.(x))` for some large vector `x`. The obvious improvement to this is `sum(log, x)`, but this still requires computing `log` for each element of an array, which can be expensive.

Playing around with this a bit, I came up with this:

``````const logtwo = log(2.0)

function sumlog(x::AbstractArray{T}) where {T}
sig = one(T)
ex = zero(Int64)
for xj in x
if sig > one(T)
sig *= 0.5 * significand(xj)
ex += exponent(xj) + 1       # make up the missing factor of 2 here
else
sig *= significand(xj)
ex += exponent(xj)
end
end
sig + logtwo * ex
end
``````

This comes from two ideas:

1. `x == ldexp(significand(x), exponent(x))`, and
2. Keeping the product balanced around one, to avoid underflow and overflow.

It seems to do ok:

``````julia> x = rand(100);

julia> @btime sumlog(\$x)
221.247 ns (0 allocations: 0 bytes)
-106.42

julia> @btime sum(log, \$x)
616.896 ns (0 allocations: 0 bytes)
-107.42
``````

But I often see this off-by-one error, and Iām not sure where it comes from.

Also, Iām guessing the `sig > one(T)` in the loop might be slowing things down, so maybe this could be faster still? We do need to have something to offset the significand, since itās always in `[1,2)` and will otherwise overflow with a long enough vector.

This isnāt a critical thing, but speeding this up could be really nice.

4 Likes

You can make it faster with `if sig > floatmax(T)/2` which will branch predict much better (and re-normalize by a bigger power).

3 Likes

I think you mean `log(sig) + logtwo * ex`.

2 Likes

Oh, thatās good! And we even have a lower bound on the number of iterations before we really need to check. So we can let it spin for a while with no branching, then check once and make a bigger correction. Yeah, I like that.

Of course, I thought that might be something obvious.

Thanks to you both! Iāll make some adjustments and post the result here soon.

1 Like

Ok now weāre talkinā

``````julia> function sumlog(x::AbstractArray{T}) where {T}
sig = one(T)
ex = zero(exponent(one(T)))
bound = floatmax(T) / 2
for xj in x
sig *= significand(xj)
ex += exponent(xj)
if sig > bound
(a, b) = (significand(sig), exponent(sig))
sig = a
ex += b
end
end
log(sig) + logtwo * ex
end
sumlog

julia> x = 10 * rand(100);

julia> @btime sumlog(\$x)
127.330 ns (0 allocations: 0 bytes)
129.854

julia> @btime sum(log, \$x)
680.462 ns (0 allocations: 0 bytes)
129.854

julia> sumlog(x) - sum(log ā big, x)
2.83327e-15
``````

Oh, but 100 iterations isnāt ever enough to overflow, so we should check for larger arrays:

``````julia> x = 10 .* rand(100000000);

julia> @time sumlog(x)
0.128695 seconds (1 allocation: 16 bytes)
1.30265e8

julia> @time sum(log, x)
0.697396 seconds (1 allocation: 16 bytes)
1.30265e8
``````
14 Likes

This looks great. Are there downsides to making this the default? Minimally just `sum(::typeof(log), x::Array{Union{Float32, Float64}})`, although perhaps some `mapreduce(f, log, x)` could too.

2 Likes

I was just wondering this too. Hereās an accuracy comparison:

``````julia> x = 10 .* rand(1000000);

julia> exact = sum(log ā big, x)
1.302404677482756428200856004490688849646850626114695987885127324259071038329214e+06

julia> Float64(sumlog(x) - exact)
-1.078137309892319e-10

julia> Float64(sum(log, x) - exact)
1.2501691266463773e-10
``````

One potential concern is if some hardware might have very fast `log` implementation, but could be very slow for branches. As I type that, GPUs come to mind, though I havenāt done enough GPU work to know whether a new method for `sum(::typeof(log), ::AbstractArray)` would end up being unintentionally used in GPU kernels.

Yes, I presume this wonāt work with GPU, although perhaps similar tricks might. I would worry a little bit about introducing dispatch ambiguities, e.g. against `sum(f, A::CuArray)`, but perhaps thereās a nice place in the `mapreduce` chain to attach this cleanly. Maybe `sum(log, A; dims)` deserves some thought too.

Seems accurate to me on quick examples. But possibly thinking up awful cases would be worthwhile. For speed as well, since youāll branch more or less often, and IIRC the speed `log` also varies with the size of the number.

2 Likes

Maybe it could go here?

1 Like

log is mostly constant asked. itās technically slower for numbers close to 1, but only slightly.

Ok. Strangely then, numbers near 1 seem slightly faster for me, but not by very much:

``````julia> begin
x = rand(1000)./100;      @btime sum(log, \$x); @btime sumlog(\$x);
x = 1 .+ rand(1000)./100; @btime sum(log, \$x); @btime sumlog(\$x);
x = 100 .+ rand(1000);    @btime sum(log, \$x); @btime sumlog(\$x);
end;
6.650 Ī¼s (0 allocations: 0 bytes)  # <<1
2.042 Ī¼s (0 allocations: 0 bytes)
4.863 Ī¼s (0 allocations: 0 bytes)  # about 1
2.042 Ī¼s (0 allocations: 0 bytes)
6.650 Ī¼s (0 allocations: 0 bytes)  # >>1
2.042 Ī¼s (0 allocations: 0 bytes)
``````

I think these still arenāt taking the branch often, as itās much slower with `ifelse`.

A function could go in a package, but defining `Base.sum(::typeof(log), x::Array{Float64})` would be piracy. Since `log` itself is defined in Base, it doesnāt seem crazy for something like this to be added too.

Agreed, but (correct me if Iām wrong Tyler) I read @tbeason as suggesting `logsum` could itself go into LogExpFunctions.jl. That seems like a good first step. If it makes it into Base as a specialized `sum` method thatās even better. `LogExpFunctions.logsum` could be deprecated at that time, or just redefined to call `log(sum, x)`.

2 Likes

Yes thatās what I meant.

1 Like
3 Likes

I think with some more cleanup, this is a good candidate to go in Base. If we make it `logabsprod` it can be a big help to speed up `LinearAlgebra.logabsdet`:

``````julia> x = LowerTriangular(randn(1000,1000));

julia> using LinearAlgebra

julia> using BenchmarkTools

julia> @btime logabsdet(\$x)
8.687 Ī¼s (0 allocations: 0 bytes)
(-631.836, -1.0)

julia> d = diag(x);

julia> @btime logabsprod(\$d)
1.202 Ī¼s (0 allocations: 0 bytes)
(-631.836, -1.0)
``````

Currently we require an array, and I donāt know of a way to lazily get a diagonal in Base. But I think itās clear there are ways to get things connected properly.

EDIT: I had the sign bit reversed, fixed now

1 Like

I suspect that in most performance-sensitive uses of `logabsdet` there will be a matrix factorization (or other operation) that will dominate the cost. Otherwise, why construct a matrix at all?

Good point @stevengj . The benefit will mostly be for triangular matrices (shown above) and small static matrices, like this:

``````julia> x = @SArray randn(3,3);

julia> @btime lu(\$x); # called by logabsdet
37.268 ns (0 allocations: 0 bytes)

julia> @btime logabsdet(\$x);
58.769 ns (0 allocations: 0 bytes)
``````

But I would think itās always at least as fast as the current implementation, even if the relative improvement is small in many cases.

EDIT: For small collections, `log(prod(x))` will still be faster. So maybe large-ish triangular matrices are the main benefit.

`diagind` together with view?

``````julia> x = LowerTriangular(randn(1000,1000));

julia> d = diag(x);

julia> @btime logabsprod(\$d);
1.202 Ī¼s (0 allocations: 0 bytes)

julia> @btime logabsprod(view(\$x, diagind(\$x)));
5.718 Ī¼s (0 allocations: 0 bytes)
``````

Could be due to locality, not sure yet

Yeah I guess thatās expected, each diagonal element lives very far away from the previous, so unless your cache lines are very long I guess youāll miss every time. If you include the timing of extracting `d = diag(x)` Iād guess the lazy alternative would still be faster.

``````julia> diagind(x)
1:1001:1000000
``````
1 Like