Another solution could consist in adding an extremely small value to the argument of the `log`

, so that it never goes to `-Inf`

. The values of `x`

for which this makes a difference are so small, that I would expect the overall result to remain constant (as soon as it is not incredibly small itself).

```
function loop2(X)
acc = zero(eltype(X))
e = nextfloat(acc)
for x in X
acc += x*(1+log(x+e))
end
acc
end
```

Let us benchmark this (see below for the complete copy-pastable code):

```
using BenchmarkTools
# array size
N = 1_000_000
# fraction of zero values in the array
frac = 0.1
# generate a random array with approximately the required fraction of zeros
X = rand(N);
Nz = Int(round(frac*N))
idx = rand(1:N, Nz);
X[idx] .= 0.;
# No handling of 0 -> NaN
@btime loop0($X)
# Version with a test
@btime loop1($X)
# The version above
@btime loop2($X)
```

which gives, for a large fraction of zeros (10%)

```
julia> @btime loop0($X)
9.627 ms (0 allocations: 0 bytes)
NaN
julia> @btime loop1($X)
10.464 ms (0 allocations: 0 bytes)
225700.31035719207
julia> @btime loop2($X)
16.853 ms (0 allocations: 0 bytes)
225700.31035719207
```

and for a smaller fraction of zeros (1%):

```
julia> @btime loop0($X)
9.514 ms (0 allocations: 0 bytes)
NaN
julia> @btime loop1($X)
10.643 ms (0 allocations: 0 bytes)
247655.2276365137
julia> @btime loop2($X)
10.916 ms (0 allocations: 0 bytes)
247655.2276365137
```

My interpretation:

- performing tests costs a price which is greatly reduced by the branch prediction features of the CPU (one could actually even expect benchmarked times to be a bit underestimated due to this; it might be a good idea to confirm this benchmark on real data in a real context);
- however, taking the sort-circuit branch saves the costly computation of a logarithm, while adding a small value to
`x`

implies that the `log`

is always computed, even when its value is useless;
- handling subnormal floating-point values comes with an associated cost, which might be why the benchmarked time increases with the fraction of zero values. Increasing the “epsilon” value (
*e.g* replacing `nextfloat(0)`

with something like `1e-100`

) helps improving performance, at the cost of a potential change in your results.

In any case, I would think that you have much to gain from vectorizing this loop, which (1) is not trivial because of the `log`

, and (2) becomes even harder if you keep the version with tests.

If this is a very hot loop and you don’t have too many zero values in your input, I would suggest using a SIMD-friendly math library such as SLEEF to vectorize your loop. There are various Julia projects helping you to do this, that you can find on github (maybe @Elrod would like to add more details). If you go down this road, having a loop body without test will probably help.

##
Complete benchmarking code

```
using BenchmarkTools
function loop0(X)
acc = zero(eltype(X))
@simd for x in X
acc += x*(1+log(x))
end
acc
end
function loop1(X)
acc = zero(eltype(X))
@simd for x in X
if x>0.
acc += x*(1+log(x))
end
end
acc
end
function loop2(X)
acc = zero(eltype(X))
e = nextfloat(acc)
@simd for x in X
acc += x*(1+log(x+e))
end
acc
end
using BenchmarkTools
frac = 0.01
N = 1_000_000
Nz = Int(round(frac*N))
X = rand(N);
idx = rand(1:N, Nz);
X[idx] .= 0.;
@btime loop0($X)
@btime loop1($X)
@btime loop2($X)
```