Most efficient way of calculating a multidimensional complex function many times

I have a very long function depends on X,Y,T. X and Y are vectors with length between 400-10000. I have to calculate the function in a for loop many times. Are my approachs efficient? If not, how can I improve them? Thank you for your help.

First approach:

using Printf
using BenchmarkTools
using Profile
using LoopVectorization

F(X, Y, T) = @.  15*(T^2/2+1)*exp(1*im*T)*(1-X)*sin(X)*(1-Y)*sin(Y)*(exp(T)*β*sin(X*π)^2*sin(Y*π)^2+225*(T^2/2+1)^2*(X-1)^2*sin(X)^2*(Y-1)^2*sin(Y)^2)+(-30*(T^2/2+1)*exp(im*T)*(1-X)*sin(X)*(1-Y)*sin(Y)-30*(T^2/2+1)*exp(im*T)*cos(X)*(1-Y)*sin(Y)-30*(T^2/2+1)*exp(im*T)*(1-X)*sin.(X)*cos(Y))*γ+(-15*(T^2/2+1)*exp(im*T)*sin(X)*(1-Y)*sin(Y)+15*(T^2/2+1)*exp(im*T)*(1-X)*cos(X)*(1-Y)*sin(Y)-15*(T^2/2+1)*exp(im*T)*(1-X)*sin(X)*sin(Y)+
15*(T^2/2+1)*exp(im*T)*(1-X)*sin(X)*(1-Y)*cos(Y))*δ+im*(15*im*(T^2/2+1)*exp(im*T)*(1-X)*sin(X)*(1-Y)*sin(Y)+15*T*exp(im*T)*(1-X)*sin(X)*(1-Y)*sin(Y))


const δ, γ, β = 1.0, 1.0, 1.0

function test(N,Δt)
# time  parameters
#Δt = 0.00001
data = rand(N,2)
x = data[:,1]; y = data[:,2]
NN = Int(round(1/Δt))

    for i in 1:NN
        f = F(x,y,i*Δt)
        #some code that uses f
    end

end

Second approach:

function test2(N,Δt)
# time  parameters
#Δt = 0.00001
data = rand(N,2)
x = data[:,1]; y = data[:,2]
f = Array{Complex{Float64}}(undef,N)
NN = Int(round(1/Δt))

    @inbounds for i in 1:NN
        @inbounds for j in 1:N
            f[j] = F(x[j],y[j],i*Δt)
        end
        #some code that uses f
    end

end

and timings:

@benchmark test(1600,0.0001)
BenchmarkTools.Trial: 
  memory estimate:  250.45 MiB
  allocs estimate:  70004
  --------------
  minimum time:     4.349 s (0.16% GC)
  median time:      4.372 s (0.15% GC)
  mean time:        4.372 s (0.15% GC)
  maximum time:     4.395 s (0.13% GC)
  --------------
  samples:          2
  evals/sample:     1
@benchmark test2(1600,0.0001)
BenchmarkTools.Trial: 
  memory estimate:  75.41 KiB
  allocs estimate:  6
  --------------
  minimum time:     3.863 s (0.00% GC)
  median time:      3.868 s (0.00% GC)
  mean time:        3.868 s (0.00% GC)
  maximum time:     3.872 s (0.00% GC)
  --------------
  samples:          2
  evals/sample:     1

The biggest thing here is that your F is incredibly expensive and redundant for no reason. If you factor out common terms, that should give a massive speedup.

1 Like

Related to what Oscar suggested, you can compute lots of quantities only once instead of over and over again, also in a more efficient way:

    sinX, cosX = sincos(X) # it's equivalent to computing sin(X), cos(X), but faster
    sinY, cosY = sincos(Y) # same as above
    cisT = cis(T) # it's equivalent to exp(im*T) but faster

Also, I’d make F operate on scalar values, and call it F.(x,y,i*Δt)

5 Likes

The quickest thing you can try is using CommonSubexpressions and then F_cse(X, Y, T) = @cse 15*(T^2/2+1)*cis(T)*(1-X)*sin(X)*..., which will pull out repeated functions. Also, you can replace exp(im*T) with cis(T) [as @giordano says!] Doing this in test2 makes it about 5x faster.

But this still seems quite wasteful. At different values of i, you re-do a lot of the same exp(X) etc. calculations. Storing and re-using those might help a lot.

3 Likes

Thank you for your suggestions, I will try and hope improvements.
@Oscar_Smith , F is a source term arising in a partial differential equation. So it is not redundant and must be calculated. There is also a 4D version of F as F(x,y,z,t) in 3D case. Therefore, some speed up in calculations of the function F would be better.

What I meant by redundant is that there are a ton of repeated calculations. Here is a version with most of them removed (designed to work on scalars)

function F(X, Y, T)
	sinX, cosX = sincos(X)
    sinY, cosY = sincos(Y)
    rx=(1-X)*sinX
    ry=(1-Y)*sinY
	ans = (T^2/2+1)*(
			+rX*rY*
				(exp(T)*β*(sinpi(X)*sinpi(Y))^2
				+225*((T^2/2+1)*rX*rY)^2
				-1)
			-2*γ*(rX*rY+rX*cosY+cosX*rY)
			+δ*(rY*(rx-sinX)+rX*(ry-sinY))
			) + T*im*rX*rY
	return 15*cis(T)*ans
end
8 Likes

@Oscar_Smith Thank you for your help, this version is faster.

1 Like

You should definitely test it a bit to make sure it gives the same answers. I think I got the refactor right, but I very well might not have.

1 Like