A_mul_B! deviates from C = A*B with complex numbers

#1

TL;DR:
LinAlg.A_mul_B!(result, A, α*v) deviates from result = α*A*v.

Hey everybody!

So I am new to the game here. I was drawn to implement the code of my project in Julia because it had all the features I was looking for built in: Sparse Matrices, native Complex Numbers, fast runtime.

Part of my project includes solving iteratively the ODE of the Schrödinger equation:

\partial_t {\psi} = -i H {\psi}

In my project a state vector \psi is represented by a 2^N-component vector; H is represented by a Sparse 2^N \times 2^N Matrix.

I opted for the 4-step (classical) Runge-Kutta solver to preserve the -Norm of the vector.

While debugging I was puzzled by the following (mis-)behaviour of the Low-Level BLAS function LinAlg.A_mul_B!(), which I wanted to employ because of the very high iterations in the iterative solving.

Consider the following code snippet:

N = 2^16

v = randn(N)
v[:] = v/norm(v)

H = sprandn(N, N, 0.002)

result1 = Array{Complex{Float64}}(v)
result2 = Array{Complex{Float64}}(v)

Δt = 0.0005

result1[:] = -im*Δt*H*v
LinAlg.A_mul_B!(result2, H, -im*Δt*v)

println("Do the two results agree?\n$(prod( result1 .== result2 ))") Δv = norm( result1 - result2 ) println("What is the norm of the difference vector Δv?\n$(Δv)")

println("Δv/Δt = $(Δv/Δt)")  which yields after running the following output: $ julia snippet.jl

Do the two results agree?
false
What is the norm of the difference vector Δv?
1.390578911436828e-18
Δv/Δt = 2.781157822873656e-15


\mathcal{O}(10^{-15}) might not seem much of a difference, yet I am puzzled there is one at all!
The difference of those two results is accumulating over the four step Runge-Kutta iteration and ends up being \mathcal{O}(2\% \Delta t). This is rather unacceptable for one simple iteration step…

The previous statement can be observed with, for example, this code:

function rungekutta!(input::Array{Complex{Float64}}, H::SparseMatrixCSC{Float64,Int64}, deltaT::Float64, k1::Array{Complex{Float64}}, k2::Array{Complex{Float64}}, k3::Array{Complex{Float64}}, k4::Array{Complex{Float64}})
LinAlg.A_mul_B!(k1, H, -im*deltaT*input )
LinAlg.A_mul_B!(k2, H, -im*deltaT*(input + LinAlg.scale!( 1./2., k1) ) )
LinAlg.A_mul_B!(k3, H, -im*deltaT*(input + LinAlg.scale!( 1./2., k2) ) )
LinAlg.A_mul_B!(k4, H, -im*deltaT*(input + k3 )  )
input[:] += (LinAlg.scale!(2.0, k2) + LinAlg.scale!(2.0, k3) + k4)/6.
return nothing
end

function rungekutta2!(input::Array{Complex{Float64}}, H::SparseMatrixCSC{Float64,Int64}, deltaT::Float64, k1::Array{Complex{Float64}}, k2::Array{Complex{Float64}}, k3::Array{Complex{Float64}}, k4::Array{Complex{Float64}})
k1[:] = -im*deltaT*(H*(input))
k2[:] = -im*deltaT*(H*(input+k1/2.0))
k3[:] = -im*deltaT*(H*(input+k2/2.0))
k4[:] = -im*deltaT*(H*(input+k3))
input[:] += (k1+2.*k2+2.*k3+k4)/6.0
return nothing
end

N = 2^16

v = randn(N)
v[:] = v/norm(v)

H = sprandn(N, N, 0.002)

k1 = zeros(Complex{Float64}, N)
k2 = zeros(Complex{Float64}, N)
k3 = zeros(Complex{Float64}, N)
k4 = zeros(Complex{Float64}, N)

result1 = Array{Complex{Float64}}(v)
result2 = Array{Complex{Float64}}(v)

Δt = 0.0005

result1[:] = -im*Δt*H*v
LinAlg.A_mul_B!(result2, H, -im*Δt*v)

rungekutta!(result1, H, Δt, k1, k2, k3, k4)
rungekutta2!(result2, H, Δt, k1, k2, k3, k4)

println("Do the two results agree?\n$(prod( result1 .== result2 ))") Δv = norm( result1 - result2 ) println("What is the norm of the difference vector Δv?\n$(Δv)")

println("Δv/Δt = $(Δv/Δt)")  which yields after running the following output: $ julia snippet2.jl

Do the two results agree?
false
What is the norm of the difference vector Δv?
1.096038473328683e-5
Δv/Δt = 0.021920769466573658


I would be very happy to find out why these two operations doe not agree with each other and even more happy, if someone finds a mistake in my code!!

Cheers
Jan

#2

The order of operations matters in floating point world.

julia> result3 = H * (-im*Δt*v);

julia> norm(result3 - result2)
0.0


Edit: as an extreme example:

julia> 1e16 + 1.0 - 1e16
0.0

julia> 1e16 - 1e16 + 1.0
1.0


and see PSA: floating-point arithmetic.

By the way, you may want to consider using OrdinaryDiffEq.jl (part of DifferentialEquations.jl) if you’re solving ODEs, for a vast array of integrators that are highly optimized and tested for accuracy.

#3

RK4 is not a conservative (geometric or symplectic) integrator. It’s also an integrator with large error terms and low order. It’s really not a good choice if you’re looking for accuracy or conservation properties.

That’s missing k1.

#4

I second the comments here. Floating point results depending on the order of operations is simply a fact of life (one that is not that important once you learn to stop worrying about it) Depending on your accuracy/timing requirements, you can get away with just calling DiffEq with an explicit solver. If unitarity is important to you, try Crank-Nicolson, or an exponential integrator (using https://github.com/marcusps/ExpmV.jl for instance). There’s mostly no point in recoding a diff eq solver (except for learning purposes): using an existing one gets you things like timestep adaptivity for free. I don’t think there is a good solution for time-dependent Schrödinger (i dt psi = H(t) psi) in DiffEq yet, though.

#5

Well there’s tooling at least in the sense of Runge-Kutta methods, but yes for time-dependent Schrondinger we want Krylov exponential integrators with String splitting:

A GSoC potential has already gotten the Krylov exponential integrators up and running, along with a draft for reformulating the operator spec in a way that’s time-dependent splitting compatible:

So while this project would be based around semilinear exponential integrators, it will build all of the infrastructure to make these methods easy to implement, and then we just need to update the file of Strang splitting methods for that new interface (unreleased: we did this as an interface test)

tl;dr: I would guess we will have all of the tools to make methods specifically for u' = A(t)u well-optimized by the end of the summer. If you want them just bump this thread sometime midsummer to remind me.

#6

That’s great, and I will be interested in testing it. Cases of interest would be i psi’ = A(t) psi / eps, and i psi’ = A(t) psi + eps B(t) psi (adiabatic and linear response respectively); I’m not sure if efficient (= robust to eps->0) methods are known for these.

#7

Yeah, I don’t know of any methods specifically for this too, but we’ll have a good setup to develop one .