I have an n-dimensional vector \delta and I would like to compute the following vector expression.
\left[ \log \left( \sum_{i=1}^n \exp(\delta_i) \right) - \log \left( \sum_{i \neq j}
\exp(\delta_i) \right) \right]_{j=1}^n
What’s the best way to do this in a compact way (hopefully no loops).
I have access to the function logsumexp
from StatsFuns so ideally I could use that but not necessary. Thanks!
logsumexp
takes an iterator as its argument, so this works
f(d) = logsumexp(d) .- ( logsumexp( (d[i] for i in eachindex(d) if i!=j) ) for j in eachindex(d) )
2 Likes
With some algebra, the expression you are trying to evaluate is equal to
x_j = \log\left(1+\frac{\exp(\delta_j-c)}{\sum_{i\ne j} \exp(\delta_i-c)}\right)
for any choice c. I’ll recommend the conventional logsumexp
choice c=\max_i \delta_i. This version can take advantage of the special function log1p
to be more accurate for small arguments.
Note that we can equivalently (although risking some numerical cancellation in the implementation) write \sum_{i\ne j} \exp(\delta_i-c) = -\exp(\delta_j-c)+\sum_{i=1}^n \exp(\delta_i-c), where the right-side version allows the precomputation of the sum. Weigh that tradeoff of efficiency versus precision. Below, I have opted for the efficient version that risks that cancellation error in some edge cases:
expscale = let c=maximum(deltas); z->exp(z-c); end
sumexpdeltas = sum(expscale, deltas)
x = [log(sumexpdeltas) - log(sumexpdeltas - expscale(d)) for d in deltas] # initial version
x = [(ed = expscale(d); log1p(ed/(sumexpdeltas-ed))) for d in deltas] # with log1p trasform
EDIT: the cancellation error of the sum transformation I used is most-significant when the sum is dominated by a single term (one \delta_i is much bigger than all others). If this is a possibility, you’ll want to use the leave-one-out version of the summation instead. The log1p
version avoids a numerical error when \delta_j is much smaller than \max_i \delta_i.
EDIT 2: For the purposes of my previous statement and Float64
precision, a difference of 20 or 30 would be the threshold to consider a number “much bigger”. That isn’t necessarily a lot, so I suspect you’ll want to use both leave-one-out summation and the transformed log1p
equation to be safe.
EDIT 3:
Just a teensy bit more algebra yields
x_j = \log\left(1+\frac{1}{\sum_{i\ne j} \exp(\delta_i-\delta_j)}\right)
Given my previous numerical concerns, this is the version I’ll recommend you implement.
2 Likes