Code becomes really slow when I combine StaticArrays and DoubleFloats

jisutich · September 22, 2022, 5:33pm

Hi:

I have some code to frequently generate haar random matrices and I want it to run fast. I use the suggestion in https://discourse.julialang.org/t/code-speed-decreases-when-loop-number-increases/78920 that I can use StaticArrays to speed my code up. It works. But recently I realize that the usual Float64 variable can’t give me enough accuracy so I try to use the package DoubleFloats which can provide high precision variables. But the problem is that when I use these two packages together, the code becomes extremely slow. So I think I must do something wrong.

using LinearAlgebra
using StatsBase
using Statistics
using Random
using StaticArrays
using Future
using DoubleFloats
using Dates

function rand_haar2(::Val{n}) where n
    M = SMatrix{n,n,ComplexDF64}(randn(ComplexDF64,n,n)) 
    q = qr(M).Q
    L = cispi.(2 .* @SVector(rand(Double64,n)))
    return q*diagm(L)
end

function main(; ps=2, theta = 1.05, Nk = 800, Nr = 10000)
    S_curve = zeros(Nk)
    S_var = zeros(Nk)


    Npool = Int(floor(10^ps))
    pool = zeros(Double64,Npool)
    pool_sample = zeros(Double64,Npool)
    spool = zeros(Double64,Npool)

    Kup= @SMatrix[cos(theta) 0; 0 sin(theta)]
    Kdown = @SMatrix[sin(theta) 0; 0 cos(theta)]
    P2up = kron(@SMatrix[1. 0.;0. 1.], @SMatrix[1 0; 0 0])
    P2down = kron(@SMatrix[1 0;0 1], @SMatrix[0 0;0 1])
    
   
    for i in 1:Nr
        pool .= 0.5
        pool_sample .= 0
       

        for k in 1:Nk
            poolcount = 0
            while poolcount < Npool
                z1 = pool[rand(1:Npool)]
                rho1 = diagm(@SVector[z1,1-z1])

                z2 = pool[rand(1:Npool)]
                rho2 = diagm(@SVector[z2,1-z2])

                u1 = rand_haar2(Val{2}())
                u2 = rand_haar2(Val{2}())

                K1up = u1*Kup*u1'
                K1down = u1*Kdown*u1'
                
                K2up = u2*Kup*u2'
                K2down = u2*Kdown*u2'

                rho1p = K1up*rho1*K1up'
                rho2p = K2up*rho2*K2up'

                p1 = real(tr(rho1p+rho1p'))/2
                p2 = real(tr(rho2p+rho2p'))/2
                
                if rand()<p1
                    rho1p = (rho1p+rho1p')/(2*p1)
                else 
                    rho1p = K1down*rho1*K1down'/((1-p1)) 
                end

                if rand()<p2
                    rho2p = (rho2p+rho2p')/(2*p2)
                else
                    rho2p = K2down*rho2*K2down'/((1-p2))
                end
             
                rho = kron(rho1p,rho2p)

                U = rand_haar2(Val{4}())
                rho_p = P2up*U*rho*U'*P2up'
                p = real(tr(rho_p+rho_p'))/2
             
                if rand()<p
                    temp =(rho_p+rho_p')/2
                    
                    rho_f = @SMatrix[temp[1,1]+temp[2,2] temp[1,3]+temp[2,4]; temp[3,1]+temp[4,2] temp[3,3]+temp[4,4]]/(p)
                else
                    temp = P2down*U*rho*U'*P2down'
                    rho_f = @SMatrix[temp[1,1]+temp[2,2] temp[1,3]+temp[2,4]; temp[3,1]+temp[4,2] temp[3,3]+temp[4,4]]/(1-p)
                end
                rho_f = (rho_f+rho_f')/2
                t = abs(tr(rho_f*rho_f))
                z = (1-t)/(1+abs(sqrt(2*t-1)))
                if !iszero(abs(z))
                    poolcount = poolcount+1
                    pool_sample[poolcount] = abs(z)
                end
            end

            pool .= pool_sample
            pool_sample .= 0

            spool .=-pool.*log.(pool).-(1.0 .- pool).*log.(1.0 .-pool)
            
            S_curve[k] += mean(spool)
            
            S_var[k] += (std(spool)/sqrt(Npool))^2
                 
        end

    end
    
    return S_curve/Nr, sqrt.(S_var)/Nr
end

@time s,t = main(ps = 6,theta=1.04,Nr = 1)

When I use DoubleFloats without StaticArrays, the speed is slow but not too bad. But when I put these two together, the speed suddenly becomes really slow.

Anyone can help?

Thanks

Oscar_Smith · September 22, 2022, 5:56pm

Figuring out what is happening here would be much easier if you profile to figure out what is actually slow.

One quick thing you could do to make this more accurate is to replace log.(1.0 .-pool) with log1p.(-pool). This will be much more accurate close to 0.

jisutich · September 22, 2022, 6:19pm

Hi Oscar：

Thanks for this suggestion. I will try to do a careful check. One thing I know is that if I replace the function rand_haar2 by a version without double64 objects provided in DoubleFloats, then the code will run much faster.

Oscar_Smith · September 22, 2022, 6:55pm

How big is n? The two things that are probably slow are the QR if it’s using a fallback algorithm, and cispi which is slower than it should be.

jisutich · September 22, 2022, 7:01pm

I use n to be either 2 or 4. I just find that generating unitary matrices alone is not so slow. Basically I ask the code generate random matrices 10^5 times and it just needs 1s. But when I ask my code to run 10^5 times (which generates random matrices and then use them to do the further calculation), it needs 30s. So I am still trying to find which part is the main reason (due to my very limited background, I am using a stupid way to check the speed by using @time).

Oscar_Smith · September 22, 2022, 7:07pm

You will get much better results when timing fast functions with @btime

jisutich · September 22, 2022, 7:29pm

Hi Oscar:

I just find the slow part of my code. They are the lines

u1 = rand_haar2(Val{2}())
u2 = rand_haar2(Val{2}())

where I generate two 2x2 random matrices by calling rand_haar2. And the line

U = rand_haar2(Val{4}())

where I generate a 4x4 random matrix. The last one is the most time consuming part. So it seems that the generation of random matrices is indeed the reason why my code is slow (But if I just call rand_haar2 in a single for loop, the speed is fast, don’t know why)

nsajko · September 22, 2022, 7:30pm

How about changing this to something like this, because you don’t want to construct dynamic matrices and convert them to static ones at run time:

M = @SMatrix randn(ComplexDF64,n,n)

jisutich · September 22, 2022, 7:31pm

Hi Neven:

I did try it privately but it didn’t change the time much. Thanks.

nsajko · September 22, 2022, 7:33pm

I just realized that creating random DoubleFloats seems to be done by creating a random BigFloat and converting it to DoubleFloat. Perhaps you could try creating converting random Float64 to DoubleFloat instead, in case that makes sense for you.

jisutich · September 22, 2022, 7:37pm

Hi Neven:

Can you explain it with more details? Sorry I have little knowledge about numerical calculation.

jisutich · September 22, 2022, 7:57pm

Hi guys:

I just checked my previous code with DoubleFloats only, it seems that the speed is also slow. So maybe my memory that the speed is okay is wrong. I guess the problem is that using double64 object is in general slow (it will be good if there is a way to optimize it).

sgaure · September 22, 2022, 8:36pm

The random generation of DoubleFloats seems to be quite slow. If you don’t depend on random numbers with Double64 precision, you could try to create them with ordinary floats and make the Double64-matrix from them like this:

M = SMatrix{n, n, ComplexDF64}(randn(ComplexF64, n, n))

Similarly with the Vector.

nsajko · September 22, 2022, 8:58pm

@jisutich please evaluate these two replacements for your rand_haar2:

using
  LinearAlgebra,
  StatsBase,
  Statistics,
  Random,
  StaticArrays,
  Future,
  DoubleFloats,
  Dates

static_array_dims_product(::Type{Dims}) where {Dims <: Tuple} =
  prod(Tuple(Dims.types))

random_static_array(rand::F, ::Type{Out}, ::Type{Source}, ::Type{Dims}) where
{F <: Function, Out <: Number, Source <: Number, Dims <: Tuple} =
  SArray{Dims}(ntuple(
    i -> Out(rand(Source)),
    Val{static_array_dims_product(Dims)}()))

random_static_array(rand::F, ::Type{Out}, ::Type{Out}, ::Type{Dims}) where
{F <: Function, Out <: Number, Dims <: Tuple} =
  SArray{Dims}(ntuple(
    i -> rand(Out),
    Val{static_array_dims_product(Dims)}()))

function rand_haar2(
  ::Val{n},
  ::Type{Src1},
  ::Type{Src2}) where {n, Src1 <: Number, Src2 <: Number}
    M = random_static_array(Base.randn, ComplexDF64, Src1, Tuple{n, n})
    q = qr(M).Q
    L = map(
      x -> cispi(2 * ComplexDF64(x)),
      random_static_array(Base.rand, Double64, Src2, Tuple{n}))
    q * diagm(L)
end

rand_haar2_fast(::Val{n}) where {n} =
  rand_haar2(Val{n}(), ComplexF64, Float64)

rand_haar2_slower(::Val{n}) where {n} =
  rand_haar2(Val{n}(), ComplexDF64, Double64)

Furthermore, I think that both DoubleFloats and (possibly) StaticArrays have some performance bugs. Will make a PR to DoubleFloats now.

jisutich · September 22, 2022, 9:09pm

Hi Neven:

Thanks for the replacements. I just checked the two functions they are faster than my previous version. Even using the slower one rand_haar2_slower, the speed is 3x my previous code with double64. I appreciate it.

jisutich · September 23, 2022, 3:00am

Hi Neven:

I have one more question. In my previous version of rand_haar2, I first need to use randn to generate random number with normal distribution as

M = SMatrix{n,n,ComplexDF64}(randn(ComplexDF64,n,n))

Then I need to use rand as

L = cispi.(2 .* @SVector(rand(Double64,n)))

In this way, the function gives me Haar random unitary matrix. But I just find that in your function random_static_array, only rand is used. Then it gives me different distribution of random unitary matrices. Is there a way to fix this problem? Thanks

nsajko · September 23, 2022, 3:10am

I edited the code in the post above, see if it’s better now.

jisutich · September 23, 2022, 3:32am

Hi Neven:

Now it gives me the correct distribution, but the speed is much slower. I guess the problem is that randn is much slower than rand?

nsajko · September 23, 2022, 4:19am

This is the PR (currently there are 5 commits) I made for DoubleFloats.jl: Performance fixes for rand by nsajko · Pull Request #156 · JuliaMath/DoubleFloats.jl · GitHub

Didn’t try measuring performance, but I think that applying the PR to DoubleFloats.jl should make your code faster.

If you want to try this out, you should be able to try it out like this:

clone my git repo git clone --depth 1 -b perf https://github.com/nsajko/DoubleFloats.jl.git (perf is the branch that the PR commits are on)
in the Julia REPL, enter ] to get into the package mode or whatever it’s called
issue the command dev PATH_WHERE_YOU_CLONED/DoubleFloats.jl (replace PATH_WHERE_YOU_CLONED appropriately)

Topic		Replies	Views
Code speed decreases when loop number increases New to Julia performance , for-loop	18	741	April 3, 2022
Static matrix with size 400 New to Julia staticarrays	7	462	May 6, 2024
Loop over array of static arrays Performance question , staticarrays	12	156	October 26, 2024
Slow performance for `rand(ComplexF64, ...)` Performance	5	357	January 8, 2021
Repeated contractions - Vector-of-StaticArrays vs HybridArrays.jl, discrepancy in performance between Float64 vs Float32 Performance question , staticarrays	9	224	August 4, 2024

Code becomes really slow when I combine StaticArrays and DoubleFloats

Related topics