# Sum of logs

In some statistical applications (and probably many others) we sometimes have to evaluate `sum(log.(x))` for some large vector `x`. The obvious improvement to this is `sum(log, x)`, but this still requires computing `log` for each element of an array, which can be expensive.

Playing around with this a bit, I came up with this:

``````const logtwo = log(2.0)

function sumlog(x::AbstractArray{T}) where {T}
sig = one(T)
ex = zero(Int64)
for xj in x
if sig > one(T)
sig *= 0.5 * significand(xj)
ex += exponent(xj) + 1       # make up the missing factor of 2 here
else
sig *= significand(xj)
ex += exponent(xj)
end
end
sig + logtwo * ex
end
``````

This comes from two ideas:

1. `x == ldexp(significand(x), exponent(x))`, and
2. Keeping the product balanced around one, to avoid underflow and overflow.

It seems to do ok:

``````julia> x = rand(100);

julia> @btime sumlog(\$x)
221.247 ns (0 allocations: 0 bytes)
-106.42

julia> @btime sum(log, \$x)
616.896 ns (0 allocations: 0 bytes)
-107.42
``````

But I often see this off-by-one error, and I’m not sure where it comes from.

Also, I’m guessing the `sig > one(T)` in the loop might be slowing things down, so maybe this could be faster still? We do need to have something to offset the significand, since it’s always in `[1,2)` and will otherwise overflow with a long enough vector.

This isn’t a critical thing, but speeding this up could be really nice.

4 Likes

You can make it faster with `if sig > floatmax(T)/2` which will branch predict much better (and re-normalize by a bigger power).

3 Likes

I think you mean `log(sig) + logtwo * ex`.

2 Likes

Oh, that’s good! And we even have a lower bound on the number of iterations before we really need to check. So we can let it spin for a while with no branching, then check once and make a bigger correction. Yeah, I like that. Of course, I thought that might be something obvious.

Thanks to you both! I’ll make some adjustments and post the result here soon.

1 Like

Ok now we’re talkin’

``````julia> function sumlog(x::AbstractArray{T}) where {T}
sig = one(T)
ex = zero(exponent(one(T)))
bound = floatmax(T) / 2
for xj in x
sig *= significand(xj)
ex += exponent(xj)
if sig > bound
(a, b) = (significand(sig), exponent(sig))
sig = a
ex += b
end
end
log(sig) + logtwo * ex
end
sumlog

julia> x = 10 * rand(100);

julia> @btime sumlog(\$x)
127.330 ns (0 allocations: 0 bytes)
129.854

julia> @btime sum(log, \$x)
680.462 ns (0 allocations: 0 bytes)
129.854

julia> sumlog(x) - sum(log ∘ big, x)
2.83327e-15
``````

Oh, but 100 iterations isn’t ever enough to overflow, so we should check for larger arrays:

``````julia> x = 10 .* rand(100000000);

julia> @time sumlog(x)
0.128695 seconds (1 allocation: 16 bytes)
1.30265e8

julia> @time sum(log, x)
0.697396 seconds (1 allocation: 16 bytes)
1.30265e8
``````
14 Likes

This looks great. Are there downsides to making this the default? Minimally just `sum(::typeof(log), x::Array{Union{Float32, Float64}})`, although perhaps some `mapreduce(f, log, x)` could too.

2 Likes

I was just wondering this too. Here’s an accuracy comparison:

``````julia> x = 10 .* rand(1000000);

julia> exact = sum(log ∘ big, x)
1.302404677482756428200856004490688849646850626114695987885127324259071038329214e+06

julia> Float64(sumlog(x) - exact)
-1.078137309892319e-10

julia> Float64(sum(log, x) - exact)
1.2501691266463773e-10
``````

One potential concern is if some hardware might have very fast `log` implementation, but could be very slow for branches. As I type that, GPUs come to mind, though I haven’t done enough GPU work to know whether a new method for `sum(::typeof(log), ::AbstractArray)` would end up being unintentionally used in GPU kernels.

Yes, I presume this won’t work with GPU, although perhaps similar tricks might. I would worry a little bit about introducing dispatch ambiguities, e.g. against `sum(f, A::CuArray)`, but perhaps there’s a nice place in the `mapreduce` chain to attach this cleanly. Maybe `sum(log, A; dims)` deserves some thought too.

Seems accurate to me on quick examples. But possibly thinking up awful cases would be worthwhile. For speed as well, since you’ll branch more or less often, and IIRC the speed `log` also varies with the size of the number.

2 Likes

Maybe it could go here?

1 Like

log is mostly constant asked. it’s technically slower for numbers close to 1, but only slightly.

Ok. Strangely then, numbers near 1 seem slightly faster for me, but not by very much:

``````julia> begin
x = rand(1000)./100;      @btime sum(log, \$x); @btime sumlog(\$x);
x = 1 .+ rand(1000)./100; @btime sum(log, \$x); @btime sumlog(\$x);
x = 100 .+ rand(1000);    @btime sum(log, \$x); @btime sumlog(\$x);
end;
6.650 μs (0 allocations: 0 bytes)  # <<1
2.042 μs (0 allocations: 0 bytes)
4.863 μs (0 allocations: 0 bytes)  # about 1
2.042 μs (0 allocations: 0 bytes)
6.650 μs (0 allocations: 0 bytes)  # >>1
2.042 μs (0 allocations: 0 bytes)
``````

I think these still aren’t taking the branch often, as it’s much slower with `ifelse`.

A function could go in a package, but defining `Base.sum(::typeof(log), x::Array{Float64})` would be piracy. Since `log` itself is defined in Base, it doesn’t seem crazy for something like this to be added too.

Agreed, but (correct me if I’m wrong Tyler) I read @tbeason as suggesting `logsum` could itself go into LogExpFunctions.jl. That seems like a good first step. If it makes it into Base as a specialized `sum` method that’s even better. `LogExpFunctions.logsum` could be deprecated at that time, or just redefined to call `log(sum, x)`.

2 Likes

Yes that’s what I meant.

1 Like
3 Likes

I think with some more cleanup, this is a good candidate to go in Base. If we make it `logabsprod` it can be a big help to speed up `LinearAlgebra.logabsdet`:

``````julia> x = LowerTriangular(randn(1000,1000));

julia> using LinearAlgebra

julia> using BenchmarkTools

julia> @btime logabsdet(\$x)
8.687 μs (0 allocations: 0 bytes)
(-631.836, -1.0)

julia> d = diag(x);

julia> @btime logabsprod(\$d)
1.202 μs (0 allocations: 0 bytes)
(-631.836, -1.0)
``````

Currently we require an array, and I don’t know of a way to lazily get a diagonal in Base. But I think it’s clear there are ways to get things connected properly.

EDIT: I had the sign bit reversed, fixed now

1 Like

I suspect that in most performance-sensitive uses of `logabsdet` there will be a matrix factorization (or other operation) that will dominate the cost. Otherwise, why construct a matrix at all?

Good point @stevengj . The benefit will mostly be for triangular matrices (shown above) and small static matrices, like this:

``````julia> x = @SArray randn(3,3);

julia> @btime lu(\$x); # called by logabsdet
37.268 ns (0 allocations: 0 bytes)

julia> @btime logabsdet(\$x);
58.769 ns (0 allocations: 0 bytes)
``````

But I would think it’s always at least as fast as the current implementation, even if the relative improvement is small in many cases.

EDIT: For small collections, `log(prod(x))` will still be faster. So maybe large-ish triangular matrices are the main benefit.

`diagind` together with view?

``````julia> x = LowerTriangular(randn(1000,1000));

julia> d = diag(x);

julia> @btime logabsprod(\$d);
1.202 μs (0 allocations: 0 bytes)

julia> @btime logabsprod(view(\$x, diagind(\$x)));
5.718 μs (0 allocations: 0 bytes)
``````

Could be due to locality, not sure yet

Yeah I guess that’s expected, each diagonal element lives very far away from the previous, so unless your cache lines are very long I guess you’ll miss every time. If you include the timing of extracting `d = diag(x)` I’d guess the lazy alternative would still be faster.

``````julia> diagind(x)
1:1001:1000000
``````
1 Like