My goal is to make the implementation of my qlog
function specialized for MultiVector
types.
In the Grassmann.jl package, I have the generic Grassmann.qlog
implementation, which has a lot of unnecessary allocations made due to the generality of the algorithm.
For the expm1,cosh,sinh
methods I was able to create an improved MultiVector
version of each algorithm to reduce the number of allocations needed. However, for qlog
I am running into strange errors on my test examples, where the result diverges on the 129th iteration.
To evaluate the function definition for qlog_fast
the following imports are needed
using Grassmann; basis"3"
import Grassmann: insert_expr,mvec,geomaddmulti!,indexbasis,binomsum_set,binomial_set
In the following experiment, the 83rd iteration causes the slow and fast version to diverge
julia> for k β 125:133
println("$k, slow: ",Grassmann.qlog(1+v12,k))
println("$k, fast: ",Grassmann.qlog_fast(1+v12,k))
end
125, slow: -1.4653360823126444e16 + 4.493825380508893e16vββ
125, fast: -1.4653360823126444e16 + 4.493825380508893e16vββ
126, slow: -1.4653360823126444e16 + 4.493825380508893e16vββ
126, fast: -1.4653360823126444e16 + 4.493825380508893e16vββ
127, slow: -8.844033711796466e16 - 2.884872248974928e16vββ
127, fast: -8.844033711796466e16 - 2.884872248974928e16vββ
128, slow: -8.844033711796466e16 - 2.884872248974928e16vββ
128, fast: -8.844033711796466e16 - 2.884872248974928e16vββ
129, slow: -2.3369029045426035e17 - 1.7409867582604496e17vββ
129, fast: 5.680961621833102e16 - 1.7409867582604496e17vββ
130, slow: -2.3369029045426035e17 - 1.7409867582604496e17vββ
130, fast: 5.680961621833102e16 - 1.7409867582604496e17vββ
131, slow: -2.3369029045426035e17 - 1.7409867582604496e17vββ
131, fast: 3.428056483688667e17 + 1.1189735632449072e17vββ
132, slow: -2.3369029045426035e17 - 1.7409867582604496e17vββ
132, fast: 3.428056483688667e17 + 1.1189735632449072e17vββ
133, slow: -2.3369029045426035e17 - 1.7409867582604496e17vββ
133, fast: -2.2045371266043258e17 + 6.7515671735379e17vββ
The slower version gives the correct results at the end, which is defined like this
function qlog(w::T,x::Int=10000) where T<:TensorAlgebra{V} where V
w2,f = w^2,frobenius(w)
prod = w*w2
S,term = w,prod/3
norms = MVector(f,frobenius(term),f)
k::Int = 5
@inbounds while (norms[2]<norms[1] || norms[2]>1) && k β€ x
S += term
ns = frobenius(S)
@inbounds ns β norms[3] && break
prod *= w2
term = prod/k
@inbounds norms .= (norms[2],frobenius(term),ns)
k += 2
end
return 2S
end # http://www.netlib.org/cephes/qlibdoc.html#qlog
However, in the specialized version for MultiVector
inputs, the result diverges on 129th iteration
@eval function qlog_fast(b::MultiVector{T,V,E},x::Int=10000) where {T,V,E}
$(insert_expr((:N,:t,:out,:bs,:bn),:mvec,:T,Float64)...)
f = frobenius(b)
w2::MultiVector{T,V,E} = b^2
B = value(w2)
S = zeros(mvec(N,t))
prod = zeros(mvec(N,t))
term = zeros(mvec(N,t))
S .= value(b)
out .= value(b*w2)
term .= out/3
norms = MVector(f,norm(term),f)
k::Int = 5
@inbounds while (norms[2]<norms[1] || norms[2]>1) && k β€ x
S += term
ns = norm(S)
@inbounds ns β norms[3] && break
prod .= out
out .= 0
# prod *= w2
for g β 1:N+1
Y = indexbasis(N,g-1)
@inbounds for i β 1:bn[g]
@inbounds val = B[bs[g]+i]
valβ 0 && for G β 1:N+1
@inbounds R = bs[G]
X = indexbasis(N,G-1)
@inbounds for j β 1:bn[G]
@inbounds geomaddmulti!(V,out,X[j],Y[i],*(prod[R+j],val))
end
end
end
end
term .= out/k
@inbounds norms .= (norms[2],norm(term),ns)
k += 2
end
S *= 2
return MultiVector{t,V}(S)
end
After checking over the algorithm many times, I am unable to find what causes the issue.
The following test shows that slow version of qlog
gives the correct result for log
when compared with regular Complex
value, note that this generalizes to full MultiVector
of any dimension and so is slower than a native implementation of Complex
values, since it handles a bigger number type.
julia> log(1+v12)
0.3465735909819527 + 0.7853981672307662vββ
julia> log(1+im)
0.34657359027997264 + 0.7853981633974483im
The qlog_fast
version is supposed to help address the issue of performance for MultiVector
input.
The two algorithms should be performing the EXACT same sequence of calculation, the qlog_fast
version is simply a refactored version of the original, with explicit allocations to minimize time.
Is it possible that I am overwriting some array values in a way that causes an issue after 129 iterations?
The fast version of the algorithm should be able to fully work correctly, reproducing the exact same calculation steps as the slow version and converging to the same result, as with expm1,cosh,sinh
.