# Code becomes really slow when I combine StaticArrays and DoubleFloats

Hi:

I have some code to frequently generate haar random matrices and I want it to run fast. I use the suggestion in https://discourse.julialang.org/t/code-speed-decreases-when-loop-number-increases/78920 that I can use `StaticArrays` to speed my code up. It works. But recently I realize that the usual `Float64` variable can’t give me enough accuracy so I try to use the package `DoubleFloats` which can provide high precision variables. But the problem is that when I use these two packages together, the code becomes extremely slow. So I think I must do something wrong.

``````using LinearAlgebra
using StatsBase
using Statistics
using Random
using StaticArrays
using Future
using DoubleFloats
using Dates

function rand_haar2(::Val{n}) where n
M = SMatrix{n,n,ComplexDF64}(randn(ComplexDF64,n,n))
q = qr(M).Q
L = cispi.(2 .* @SVector(rand(Double64,n)))
return q*diagm(L)
end

function main(; ps=2, theta = 1.05, Nk = 800, Nr = 10000)
S_curve = zeros(Nk)
S_var = zeros(Nk)

Npool = Int(floor(10^ps))
pool = zeros(Double64,Npool)
pool_sample = zeros(Double64,Npool)
spool = zeros(Double64,Npool)

Kup= @SMatrix[cos(theta) 0; 0 sin(theta)]
Kdown = @SMatrix[sin(theta) 0; 0 cos(theta)]
P2up = kron(@SMatrix[1. 0.;0. 1.], @SMatrix[1 0; 0 0])
P2down = kron(@SMatrix[1 0;0 1], @SMatrix[0 0;0 1])

for i in 1:Nr
pool .= 0.5
pool_sample .= 0

for k in 1:Nk
poolcount = 0
while poolcount < Npool
z1 = pool[rand(1:Npool)]
rho1 = diagm(@SVector[z1,1-z1])

z2 = pool[rand(1:Npool)]
rho2 = diagm(@SVector[z2,1-z2])

u1 = rand_haar2(Val{2}())
u2 = rand_haar2(Val{2}())

K1up = u1*Kup*u1'
K1down = u1*Kdown*u1'

K2up = u2*Kup*u2'
K2down = u2*Kdown*u2'

rho1p = K1up*rho1*K1up'
rho2p = K2up*rho2*K2up'

p1 = real(tr(rho1p+rho1p'))/2
p2 = real(tr(rho2p+rho2p'))/2

if rand()<p1
rho1p = (rho1p+rho1p')/(2*p1)
else
rho1p = K1down*rho1*K1down'/((1-p1))
end

if rand()<p2
rho2p = (rho2p+rho2p')/(2*p2)
else
rho2p = K2down*rho2*K2down'/((1-p2))
end

rho = kron(rho1p,rho2p)

U = rand_haar2(Val{4}())
rho_p = P2up*U*rho*U'*P2up'
p = real(tr(rho_p+rho_p'))/2

if rand()<p
temp =(rho_p+rho_p')/2

rho_f = @SMatrix[temp[1,1]+temp[2,2] temp[1,3]+temp[2,4]; temp[3,1]+temp[4,2] temp[3,3]+temp[4,4]]/(p)
else
temp = P2down*U*rho*U'*P2down'
rho_f = @SMatrix[temp[1,1]+temp[2,2] temp[1,3]+temp[2,4]; temp[3,1]+temp[4,2] temp[3,3]+temp[4,4]]/(1-p)
end
rho_f = (rho_f+rho_f')/2
t = abs(tr(rho_f*rho_f))
z = (1-t)/(1+abs(sqrt(2*t-1)))
if !iszero(abs(z))
poolcount = poolcount+1
pool_sample[poolcount] = abs(z)
end
end

pool .= pool_sample
pool_sample .= 0

spool .=-pool.*log.(pool).-(1.0 .- pool).*log.(1.0 .-pool)

S_curve[k] += mean(spool)

S_var[k] += (std(spool)/sqrt(Npool))^2

end

end

return S_curve/Nr, sqrt.(S_var)/Nr
end

@time s,t = main(ps = 6,theta=1.04,Nr = 1)
``````

When I use `DoubleFloats` without `StaticArrays`, the speed is slow but not too bad. But when I put these two together, the speed suddenly becomes really slow.

Anyone can help?

Thanks

Figuring out what is happening here would be much easier if you profile to figure out what is actually slow.

One quick thing you could do to make this more accurate is to replace `log.(1.0 .-pool)` with `log1p.(-pool)`. This will be much more accurate close to 0.

2 Likes

Hi Oscar：

Thanks for this suggestion. I will try to do a careful check. One thing I know is that if I replace the function rand_haar2 by a version without double64 objects provided in DoubleFloats, then the code will run much faster.

How big is `n`? The two things that are probably slow are the QR if it’s using a fallback algorithm, and `cispi` which is slower than it should be.

I use n to be either 2 or 4. I just find that generating unitary matrices alone is not so slow. Basically I ask the code generate random matrices 10^5 times and it just needs 1s. But when I ask my code to run 10^5 times (which generates random matrices and then use them to do the further calculation), it needs 30s. So I am still trying to find which part is the main reason (due to my very limited background, I am using a stupid way to check the speed by using @time).

You will get much better results when timing fast functions with `@btime`

Hi Oscar:

I just find the slow part of my code. They are the lines

``````u1 = rand_haar2(Val{2}())
u2 = rand_haar2(Val{2}())
``````

where I generate two 2x2 random matrices by calling `rand_haar2`. And the line

``````U = rand_haar2(Val{4}())
``````

where I generate a 4x4 random matrix. The last one is the most time consuming part. So it seems that the generation of random matrices is indeed the reason why my code is slow (But if I just call `rand_haar2` in a single for loop, the speed is fast, don’t know why)

How about changing this to something like this, because you don’t want to construct dynamic matrices and convert them to static ones at run time:

``````M = @SMatrix randn(ComplexDF64,n,n)
``````

Hi Neven:

I did try it privately but it didn’t change the time much. Thanks.

I just realized that creating random DoubleFloats seems to be done by creating a random BigFloat and converting it to DoubleFloat. Perhaps you could try creating converting random Float64 to DoubleFloat instead, in case that makes sense for you.

Hi Neven:

Can you explain it with more details? Sorry I have little knowledge about numerical calculation.

Hi guys:

I just checked my previous code with `DoubleFloats` only, it seems that the speed is also slow. So maybe my memory that the speed is okay is wrong. I guess the problem is that using `double64` object is in general slow (it will be good if there is a way to optimize it).

The random generation of DoubleFloats seems to be quite slow. If you don’t depend on random numbers with Double64 precision, you could try to create them with ordinary floats and make the Double64-matrix from them like this:

``````M = SMatrix{n, n, ComplexDF64}(randn(ComplexF64, n, n))
``````

Similarly with the Vector.

1 Like

@jisutich please evaluate these two replacements for your `rand_haar2`:

``````using
LinearAlgebra,
StatsBase,
Statistics,
Random,
StaticArrays,
Future,
DoubleFloats,
Dates

static_array_dims_product(::Type{Dims}) where {Dims <: Tuple} =
prod(Tuple(Dims.types))

random_static_array(rand::F, ::Type{Out}, ::Type{Source}, ::Type{Dims}) where
{F <: Function, Out <: Number, Source <: Number, Dims <: Tuple} =
SArray{Dims}(ntuple(
i -> Out(rand(Source)),
Val{static_array_dims_product(Dims)}()))

random_static_array(rand::F, ::Type{Out}, ::Type{Out}, ::Type{Dims}) where
{F <: Function, Out <: Number, Dims <: Tuple} =
SArray{Dims}(ntuple(
i -> rand(Out),
Val{static_array_dims_product(Dims)}()))

function rand_haar2(
::Val{n},
::Type{Src1},
::Type{Src2}) where {n, Src1 <: Number, Src2 <: Number}
M = random_static_array(Base.randn, ComplexDF64, Src1, Tuple{n, n})
q = qr(M).Q
L = map(
x -> cispi(2 * ComplexDF64(x)),
random_static_array(Base.rand, Double64, Src2, Tuple{n}))
q * diagm(L)
end

rand_haar2_fast(::Val{n}) where {n} =
rand_haar2(Val{n}(), ComplexF64, Float64)

rand_haar2_slower(::Val{n}) where {n} =
rand_haar2(Val{n}(), ComplexDF64, Double64)
``````

Furthermore, I think that both DoubleFloats and (possibly) StaticArrays have some performance bugs. Will make a PR to DoubleFloats now.

2 Likes

Hi Neven:

Thanks for the replacements. I just checked the two functions they are faster than my previous version. Even using the slower one `rand_haar2_slower`, the speed is 3x my previous code with `double64`. I appreciate it.

1 Like

Hi Neven:

I have one more question. In my previous version of `rand_haar2`, I first need to use `randn` to generate random number with normal distribution as

``````M = SMatrix{n,n,ComplexDF64}(randn(ComplexDF64,n,n))
``````

Then I need to use `rand` as

``````L = cispi.(2 .* @SVector(rand(Double64,n)))
``````

In this way, the function gives me Haar random unitary matrix. But I just find that in your function `random_static_array`, only `rand` is used. Then it gives me different distribution of random unitary matrices. Is there a way to fix this problem? Thanks

I edited the code in the post above, see if it’s better now.

Hi Neven:

Now it gives me the correct distribution, but the speed is much slower. I guess the problem is that `randn` is much slower than `rand`?

This is the PR (currently there are 5 commits) I made for DoubleFloats.jl: Performance fixes for rand by nsajko · Pull Request #156 · JuliaMath/DoubleFloats.jl · GitHub

Didn’t try measuring performance, but I think that applying the PR to DoubleFloats.jl should make your code faster.

If you want to try this out, you should be able to try it out like this:

1. clone my git repo `git clone --depth 1 -b perf https://github.com/nsajko/DoubleFloats.jl.git` (`perf` is the branch that the PR commits are on)
2. in the Julia REPL, enter `]` to get into the package mode or whatever it’s called
3. issue the command `dev PATH_WHERE_YOU_CLONED/DoubleFloats.jl` (replace PATH_WHERE_YOU_CLONED appropriately)
1 Like