My goal is to make the implementation of my `qlog`

function specialized for `MultiVector`

types.

In the Grassmann.jl package, I have the generic `Grassmann.qlog`

implementation, which has a lot of unnecessary allocations made due to the generality of the algorithm.

For the `expm1,cosh,sinh`

methods I was able to create an improved `MultiVector`

version of each algorithm to reduce the number of allocations needed. However, for `qlog`

I am running into strange errors on my test examples, where the result diverges on the 129th iteration.

To evaluate the function definition for `qlog_fast`

the following imports are needed

```
using Grassmann; basis"3"
import Grassmann: insert_expr,mvec,geomaddmulti!,indexbasis,binomsum_set,binomial_set
```

In the following experiment, the 83rd iteration causes the slow and fast version to diverge

```
julia> for k β 125:133
println("$k, slow: ",Grassmann.qlog(1+v12,k))
println("$k, fast: ",Grassmann.qlog_fast(1+v12,k))
end
125, slow: -1.4653360823126444e16 + 4.493825380508893e16vββ
125, fast: -1.4653360823126444e16 + 4.493825380508893e16vββ
126, slow: -1.4653360823126444e16 + 4.493825380508893e16vββ
126, fast: -1.4653360823126444e16 + 4.493825380508893e16vββ
127, slow: -8.844033711796466e16 - 2.884872248974928e16vββ
127, fast: -8.844033711796466e16 - 2.884872248974928e16vββ
128, slow: -8.844033711796466e16 - 2.884872248974928e16vββ
128, fast: -8.844033711796466e16 - 2.884872248974928e16vββ
129, slow: -2.3369029045426035e17 - 1.7409867582604496e17vββ
129, fast: 5.680961621833102e16 - 1.7409867582604496e17vββ
130, slow: -2.3369029045426035e17 - 1.7409867582604496e17vββ
130, fast: 5.680961621833102e16 - 1.7409867582604496e17vββ
131, slow: -2.3369029045426035e17 - 1.7409867582604496e17vββ
131, fast: 3.428056483688667e17 + 1.1189735632449072e17vββ
132, slow: -2.3369029045426035e17 - 1.7409867582604496e17vββ
132, fast: 3.428056483688667e17 + 1.1189735632449072e17vββ
133, slow: -2.3369029045426035e17 - 1.7409867582604496e17vββ
133, fast: -2.2045371266043258e17 + 6.7515671735379e17vββ
```

The slower version gives the correct results at the end, which is defined like this

```
function qlog(w::T,x::Int=10000) where T<:TensorAlgebra{V} where V
w2,f = w^2,frobenius(w)
prod = w*w2
S,term = w,prod/3
norms = MVector(f,frobenius(term),f)
k::Int = 5
@inbounds while (norms[2]<norms[1] || norms[2]>1) && k β€ x
S += term
ns = frobenius(S)
@inbounds ns β norms[3] && break
prod *= w2
term = prod/k
@inbounds norms .= (norms[2],frobenius(term),ns)
k += 2
end
return 2S
end # http://www.netlib.org/cephes/qlibdoc.html#qlog
```

However, in the specialized version for `MultiVector`

inputs, the result diverges on 129th iteration

```
@eval function qlog_fast(b::MultiVector{T,V,E},x::Int=10000) where {T,V,E}
$(insert_expr((:N,:t,:out,:bs,:bn),:mvec,:T,Float64)...)
f = frobenius(b)
w2::MultiVector{T,V,E} = b^2
B = value(w2)
S = zeros(mvec(N,t))
prod = zeros(mvec(N,t))
term = zeros(mvec(N,t))
S .= value(b)
out .= value(b*w2)
term .= out/3
norms = MVector(f,norm(term),f)
k::Int = 5
@inbounds while (norms[2]<norms[1] || norms[2]>1) && k β€ x
S += term
ns = norm(S)
@inbounds ns β norms[3] && break
prod .= out
out .= 0
# prod *= w2
for g β 1:N+1
Y = indexbasis(N,g-1)
@inbounds for i β 1:bn[g]
@inbounds val = B[bs[g]+i]
valβ 0 && for G β 1:N+1
@inbounds R = bs[G]
X = indexbasis(N,G-1)
@inbounds for j β 1:bn[G]
@inbounds geomaddmulti!(V,out,X[j],Y[i],*(prod[R+j],val))
end
end
end
end
term .= out/k
@inbounds norms .= (norms[2],norm(term),ns)
k += 2
end
S *= 2
return MultiVector{t,V}(S)
end
```

After checking over the algorithm many times, I am unable to find what causes the issue.

The following test shows that slow version of `qlog`

gives the correct result for `log`

when compared with regular `Complex`

value, note that this generalizes to full `MultiVector`

of any dimension and so is slower than a native implementation of `Complex`

values, since it handles a bigger number type.

```
julia> log(1+v12)
0.3465735909819527 + 0.7853981672307662vββ
julia> log(1+im)
0.34657359027997264 + 0.7853981633974483im
```

The `qlog_fast`

version is supposed to help address the issue of performance for `MultiVector`

input.

The two algorithms should be performing the EXACT same sequence of calculation, the `qlog_fast`

version is simply a *refactored* version of the original, with explicit allocations to minimize time.

Is it possible that I am overwriting some array values in a way that causes an issue after 129 iterations?

The fast version of the algorithm should be able to fully work correctly, reproducing the exact same calculation steps as the slow version and converging to the same result, as with `expm1,cosh,sinh`

.