Code becomes really slow when I combine StaticArrays and DoubleFloats

Hi:

I have some code to frequently generate haar random matrices and I want it to run fast. I use the suggestion in https://discourse.julialang.org/t/code-speed-decreases-when-loop-number-increases/78920 that I can use StaticArrays to speed my code up. It works. But recently I realize that the usual Float64 variable can’t give me enough accuracy so I try to use the package DoubleFloats which can provide high precision variables. But the problem is that when I use these two packages together, the code becomes extremely slow. So I think I must do something wrong.

using LinearAlgebra
using StatsBase
using Statistics
using Random
using StaticArrays
using Future
using DoubleFloats
using Dates

function rand_haar2(::Val{n}) where n
    M = SMatrix{n,n,ComplexDF64}(randn(ComplexDF64,n,n)) 
    q = qr(M).Q
    L = cispi.(2 .* @SVector(rand(Double64,n)))
    return q*diagm(L)
end

function main(; ps=2, theta = 1.05, Nk = 800, Nr = 10000)
    S_curve = zeros(Nk)
    S_var = zeros(Nk)


    Npool = Int(floor(10^ps))
    pool = zeros(Double64,Npool)
    pool_sample = zeros(Double64,Npool)
    spool = zeros(Double64,Npool)

    Kup= @SMatrix[cos(theta) 0; 0 sin(theta)]
    Kdown = @SMatrix[sin(theta) 0; 0 cos(theta)]
    P2up = kron(@SMatrix[1. 0.;0. 1.], @SMatrix[1 0; 0 0])
    P2down = kron(@SMatrix[1 0;0 1], @SMatrix[0 0;0 1])
    
   
    for i in 1:Nr
        pool .= 0.5
        pool_sample .= 0
       

        for k in 1:Nk
            poolcount = 0
            while poolcount < Npool
                z1 = pool[rand(1:Npool)]
                rho1 = diagm(@SVector[z1,1-z1])

                z2 = pool[rand(1:Npool)]
                rho2 = diagm(@SVector[z2,1-z2])

                u1 = rand_haar2(Val{2}())
                u2 = rand_haar2(Val{2}())

                K1up = u1*Kup*u1'
                K1down = u1*Kdown*u1'
                
                K2up = u2*Kup*u2'
                K2down = u2*Kdown*u2'

                rho1p = K1up*rho1*K1up'
                rho2p = K2up*rho2*K2up'

                p1 = real(tr(rho1p+rho1p'))/2
                p2 = real(tr(rho2p+rho2p'))/2
                
                if rand()<p1
                    rho1p = (rho1p+rho1p')/(2*p1)
                else 
                    rho1p = K1down*rho1*K1down'/((1-p1)) 
                end

                if rand()<p2
                    rho2p = (rho2p+rho2p')/(2*p2)
                else
                    rho2p = K2down*rho2*K2down'/((1-p2))
                end
             
                rho = kron(rho1p,rho2p)

                U = rand_haar2(Val{4}())
                rho_p = P2up*U*rho*U'*P2up'
                p = real(tr(rho_p+rho_p'))/2
             
                if rand()<p
                    temp =(rho_p+rho_p')/2
                    
                    rho_f = @SMatrix[temp[1,1]+temp[2,2] temp[1,3]+temp[2,4]; temp[3,1]+temp[4,2] temp[3,3]+temp[4,4]]/(p)
                else
                    temp = P2down*U*rho*U'*P2down'
                    rho_f = @SMatrix[temp[1,1]+temp[2,2] temp[1,3]+temp[2,4]; temp[3,1]+temp[4,2] temp[3,3]+temp[4,4]]/(1-p)
                end
                rho_f = (rho_f+rho_f')/2
                t = abs(tr(rho_f*rho_f))
                z = (1-t)/(1+abs(sqrt(2*t-1)))
                if !iszero(abs(z))
                    poolcount = poolcount+1
                    pool_sample[poolcount] = abs(z)
                end
            end

            pool .= pool_sample
            pool_sample .= 0

            spool .=-pool.*log.(pool).-(1.0 .- pool).*log.(1.0 .-pool)
            
            S_curve[k] += mean(spool)
            
            S_var[k] += (std(spool)/sqrt(Npool))^2
                 
        end

    end
    
    return S_curve/Nr, sqrt.(S_var)/Nr
end

@time s,t = main(ps = 6,theta=1.04,Nr = 1)

When I use DoubleFloats without StaticArrays, the speed is slow but not too bad. But when I put these two together, the speed suddenly becomes really slow.

Anyone can help?

Thanks

Figuring out what is happening here would be much easier if you profile to figure out what is actually slow.

One quick thing you could do to make this more accurate is to replace log.(1.0 .-pool) with log1p.(-pool). This will be much more accurate close to 0.

2 Likes

Hi Oscar:

Thanks for this suggestion. I will try to do a careful check. One thing I know is that if I replace the function rand_haar2 by a version without double64 objects provided in DoubleFloats, then the code will run much faster.

How big is n? The two things that are probably slow are the QR if it’s using a fallback algorithm, and cispi which is slower than it should be.

I use n to be either 2 or 4. I just find that generating unitary matrices alone is not so slow. Basically I ask the code generate random matrices 10^5 times and it just needs 1s. But when I ask my code to run 10^5 times (which generates random matrices and then use them to do the further calculation), it needs 30s. So I am still trying to find which part is the main reason (due to my very limited background, I am using a stupid way to check the speed by using @time).

You will get much better results when timing fast functions with @btime

Hi Oscar:

I just find the slow part of my code. They are the lines

u1 = rand_haar2(Val{2}())
u2 = rand_haar2(Val{2}())

where I generate two 2x2 random matrices by calling rand_haar2. And the line

U = rand_haar2(Val{4}())

where I generate a 4x4 random matrix. The last one is the most time consuming part. So it seems that the generation of random matrices is indeed the reason why my code is slow (But if I just call rand_haar2 in a single for loop, the speed is fast, don’t know why)

How about changing this to something like this, because you don’t want to construct dynamic matrices and convert them to static ones at run time:

M = @SMatrix randn(ComplexDF64,n,n)

Hi Neven:

I did try it privately but it didn’t change the time much. Thanks.

I just realized that creating random DoubleFloats seems to be done by creating a random BigFloat and converting it to DoubleFloat. Perhaps you could try creating converting random Float64 to DoubleFloat instead, in case that makes sense for you.

Hi Neven:

Can you explain it with more details? Sorry I have little knowledge about numerical calculation.

Hi guys:

I just checked my previous code with DoubleFloats only, it seems that the speed is also slow. So maybe my memory that the speed is okay is wrong. I guess the problem is that using double64 object is in general slow (it will be good if there is a way to optimize it).

The random generation of DoubleFloats seems to be quite slow. If you don’t depend on random numbers with Double64 precision, you could try to create them with ordinary floats and make the Double64-matrix from them like this:

M = SMatrix{n, n, ComplexDF64}(randn(ComplexF64, n, n))

Similarly with the Vector.

2 Likes

@jisutich please evaluate these two replacements for your rand_haar2:

using
  LinearAlgebra,
  StatsBase,
  Statistics,
  Random,
  StaticArrays,
  Future,
  DoubleFloats,
  Dates

static_array_dims_product(::Type{Dims}) where {Dims <: Tuple} =
  prod(Tuple(Dims.types))

random_static_array(rand::F, ::Type{Out}, ::Type{Source}, ::Type{Dims}) where
{F <: Function, Out <: Number, Source <: Number, Dims <: Tuple} =
  SArray{Dims}(ntuple(
    i -> Out(rand(Source)),
    Val{static_array_dims_product(Dims)}()))

random_static_array(rand::F, ::Type{Out}, ::Type{Out}, ::Type{Dims}) where
{F <: Function, Out <: Number, Dims <: Tuple} =
  SArray{Dims}(ntuple(
    i -> rand(Out),
    Val{static_array_dims_product(Dims)}()))

function rand_haar2(
  ::Val{n},
  ::Type{Src1},
  ::Type{Src2}) where {n, Src1 <: Number, Src2 <: Number}
    M = random_static_array(Base.randn, ComplexDF64, Src1, Tuple{n, n})
    q = qr(M).Q
    L = map(
      x -> cispi(2 * ComplexDF64(x)),
      random_static_array(Base.rand, Double64, Src2, Tuple{n}))
    q * diagm(L)
end

rand_haar2_fast(::Val{n}) where {n} =
  rand_haar2(Val{n}(), ComplexF64, Float64)

rand_haar2_slower(::Val{n}) where {n} =
  rand_haar2(Val{n}(), ComplexDF64, Double64)

Furthermore, I think that both DoubleFloats and (possibly) StaticArrays have some performance bugs. Will make a PR to DoubleFloats now.

3 Likes

Hi Neven:

Thanks for the replacements. I just checked the two functions they are faster than my previous version. Even using the slower one rand_haar2_slower, the speed is 3x my previous code with double64. I appreciate it.

1 Like

Hi Neven:

I have one more question. In my previous version of rand_haar2, I first need to use randn to generate random number with normal distribution as

M = SMatrix{n,n,ComplexDF64}(randn(ComplexDF64,n,n)) 

Then I need to use rand as

L = cispi.(2 .* @SVector(rand(Double64,n)))

In this way, the function gives me Haar random unitary matrix. But I just find that in your function random_static_array, only rand is used. Then it gives me different distribution of random unitary matrices. Is there a way to fix this problem? Thanks

I edited the code in the post above, see if it’s better now.

Hi Neven:

Now it gives me the correct distribution, but the speed is much slower. I guess the problem is that randn is much slower than rand?

This is the PR (currently there are 5 commits) I made for DoubleFloats.jl: Performance fixes for rand by nsajko · Pull Request #156 · JuliaMath/DoubleFloats.jl · GitHub

Didn’t try measuring performance, but I think that applying the PR to DoubleFloats.jl should make your code faster.

If you want to try this out, you should be able to try it out like this:

  1. clone my git repo git clone --depth 1 -b perf https://github.com/nsajko/DoubleFloats.jl.git (perf is the branch that the PR commits are on)
  2. in the Julia REPL, enter ] to get into the package mode or whatever it’s called
  3. issue the command dev PATH_WHERE_YOU_CLONED/DoubleFloats.jl (replace PATH_WHERE_YOU_CLONED appropriately)
2 Likes