Hi,
I have a question about inlining functions that hopefully somebody here can help me solve. I define an inlined potential function as follows
@inline function V(x)
1.0/x^5
end;
and also a user-defined type that will hold it
mutable struct mine
x :: Float64
Pot :: Function
mine() = new()
end
from where I define
W = mine();
W.x = 1.0;
W.Pot = V; # here I assign the potential function
Np = 64;
r = rand(Np,3);
Now with that I want to evaluate the potential energy with this function
function Epot(r,glob::mine)
Ep = 0.0
ri = zero(r[1,:])
rij = zero(r[1,:])
N = size(r,1)
for i in 1:N-1
for id in 1:3
ri[id] = r[i,id]
end
for j in i+1:N
for jd in 1:3
rij[jd] = ri[jd] - r[j,jd]
end
rr = norm(rij)
Ep += glob.Pot(rr)
# Ep += V(rr)
end
end
Ep
end;
for that I use BenchmarkTools and get
@btime Epot(r,W)
237.614 ÎĽs (6052 allocations: 94.94 KiB)
2.0618823575716226e6
Now I can directly use V(x) instead of glob.Pot as in the first commented line at the bottom ofthe function, and get much better results, which way less allocations
@btime Epot(r,W)
166.685 ÎĽs (5 allocations: 464 bytes)
2.0618823575716226e6
Since V(x) is already defined to be @inline, my question is why is it that this property (=inlining) is not propagated through my type mine()? Is there a way to keep the function definition in mine(), and still make the compiler understand that the function call in glob.Pot inside function Epot must be inlined? I ask because I see that all the allocations go there, and I get a performance penalty that would like to avoid if possible.
On another (performance) note, please notice that in my function I copy all the coordinates r[:i] of the ith-particle with a for loop. My first approach was to simply replace lines like
for id in 1:3
ri[id] = r[i,id]
end
with
ri .= r[1,:]
but that also increases the number of allocations, in the later case, from 5 to
@btime Epot(r,W)
170.148 ÎĽs (68 allocations: 7.34 KiB)
2.0618823575716226e6
and even worse if I do the same with the rij variable in the j loop.
Why is this happening? Is there a way to use this simple, yet nice vectorial notation, without increasing much the number the allocations?
Best regards and thanks,
Ferran.