# Most efficient way of calculating a multidimensional complex function many times

I have a very long function depends on X,Y,T. X and Y are vectors with length between 400-10000. I have to calculate the function in a for loop many times. Are my approachs efficient? If not, how can I improve them? Thank you for your help.

First approach:

``````using Printf
using BenchmarkTools
using Profile
using LoopVectorization

F(X, Y, T) = @.  15*(T^2/2+1)*exp(1*im*T)*(1-X)*sin(X)*(1-Y)*sin(Y)*(exp(T)*β*sin(X*π)^2*sin(Y*π)^2+225*(T^2/2+1)^2*(X-1)^2*sin(X)^2*(Y-1)^2*sin(Y)^2)+(-30*(T^2/2+1)*exp(im*T)*(1-X)*sin(X)*(1-Y)*sin(Y)-30*(T^2/2+1)*exp(im*T)*cos(X)*(1-Y)*sin(Y)-30*(T^2/2+1)*exp(im*T)*(1-X)*sin.(X)*cos(Y))*γ+(-15*(T^2/2+1)*exp(im*T)*sin(X)*(1-Y)*sin(Y)+15*(T^2/2+1)*exp(im*T)*(1-X)*cos(X)*(1-Y)*sin(Y)-15*(T^2/2+1)*exp(im*T)*(1-X)*sin(X)*sin(Y)+
15*(T^2/2+1)*exp(im*T)*(1-X)*sin(X)*(1-Y)*cos(Y))*δ+im*(15*im*(T^2/2+1)*exp(im*T)*(1-X)*sin(X)*(1-Y)*sin(Y)+15*T*exp(im*T)*(1-X)*sin(X)*(1-Y)*sin(Y))

const δ, γ, β = 1.0, 1.0, 1.0

function test(N,Δt)
# time  parameters
#Δt = 0.00001
data = rand(N,2)
x = data[:,1]; y = data[:,2]
NN = Int(round(1/Δt))

for i in 1:NN
f = F(x,y,i*Δt)
#some code that uses f
end

end
``````

Second approach:

``````function test2(N,Δt)
# time  parameters
#Δt = 0.00001
data = rand(N,2)
x = data[:,1]; y = data[:,2]
f = Array{Complex{Float64}}(undef,N)
NN = Int(round(1/Δt))

@inbounds for i in 1:NN
@inbounds for j in 1:N
f[j] = F(x[j],y[j],i*Δt)
end
#some code that uses f
end

end
``````

and timings:

``````@benchmark test(1600,0.0001)
BenchmarkTools.Trial:
memory estimate:  250.45 MiB
allocs estimate:  70004
--------------
minimum time:     4.349 s (0.16% GC)
median time:      4.372 s (0.15% GC)
mean time:        4.372 s (0.15% GC)
maximum time:     4.395 s (0.13% GC)
--------------
samples:          2
evals/sample:     1
``````
``````@benchmark test2(1600,0.0001)
BenchmarkTools.Trial:
memory estimate:  75.41 KiB
allocs estimate:  6
--------------
minimum time:     3.863 s (0.00% GC)
median time:      3.868 s (0.00% GC)
mean time:        3.868 s (0.00% GC)
maximum time:     3.872 s (0.00% GC)
--------------
samples:          2
evals/sample:     1
``````

The biggest thing here is that your `F` is incredibly expensive and redundant for no reason. If you factor out common terms, that should give a massive speedup.

1 Like

Related to what Oscar suggested, you can compute lots of quantities only once instead of over and over again, also in a more efficient way:

``````    sinX, cosX = sincos(X) # it's equivalent to computing sin(X), cos(X), but faster
sinY, cosY = sincos(Y) # same as above
cisT = cis(T) # it's equivalent to exp(im*T) but faster
``````

Also, I’d make `F` operate on scalar values, and call it `F.(x,y,i*Δt)`

5 Likes

The quickest thing you can try is `using CommonSubexpressions` and then `F_cse(X, Y, T) = @cse 15*(T^2/2+1)*cis(T)*(1-X)*sin(X)*...`, which will pull out repeated functions. Also, you can replace `exp(im*T)` with `cis(T)` [as @giordano says!] Doing this in `test2` makes it about 5x faster.

But this still seems quite wasteful. At different values of `i`, you re-do a lot of the same `exp(X)` etc. calculations. Storing and re-using those might help a lot.

3 Likes

Thank you for your suggestions, I will try and hope improvements.
@Oscar_Smith , F is a source term arising in a partial differential equation. So it is not redundant and must be calculated. There is also a 4D version of F as F(x,y,z,t) in 3D case. Therefore, some speed up in calculations of the function F would be better.

What I meant by redundant is that there are a ton of repeated calculations. Here is a version with most of them removed (designed to work on scalars)

``````function F(X, Y, T)
sinX, cosX = sincos(X)
sinY, cosY = sincos(Y)
rx=(1-X)*sinX
ry=(1-Y)*sinY
ans = (T^2/2+1)*(
+rX*rY*
(exp(T)*β*(sinpi(X)*sinpi(Y))^2
+225*((T^2/2+1)*rX*rY)^2
-1)
-2*γ*(rX*rY+rX*cosY+cosX*rY)
+δ*(rY*(rx-sinX)+rX*(ry-sinY))
) + T*im*rX*rY
return 15*cis(T)*ans
end
``````
8 Likes

@Oscar_Smith Thank you for your help, this version is faster.

1 Like

You should definitely test it a bit to make sure it gives the same answers. I think I got the refactor right, but I very well might not have.

1 Like