I have a question about inlining functions that hopefully somebody here can help me solve. I define an inlined potential function as follows
@inline function V(x) 1.0/x^5 end;
and also a user-defined type that will hold it
mutable struct mine x :: Float64 Pot :: Function mine() = new() end
from where I define
W = mine(); W.x = 1.0; W.Pot = V; # here I assign the potential function Np = 64; r = rand(Np,3);
Now with that I want to evaluate the potential energy with this function
function Epot(r,glob::mine) Ep = 0.0 ri = zero(r[1,:]) rij = zero(r[1,:]) N = size(r,1) for i in 1:N-1 for id in 1:3 ri[id] = r[i,id] end for j in i+1:N for jd in 1:3 rij[jd] = ri[jd] - r[j,jd] end rr = norm(rij) Ep += glob.Pot(rr) # Ep += V(rr) end end Ep end;
for that I use BenchmarkTools and get
@btime Epot(r,W) 237.614 μs (6052 allocations: 94.94 KiB) 2.0618823575716226e6
Now I can directly use V(x) instead of glob.Pot as in the first commented line at the bottom ofthe function, and get much better results, which way less allocations
@btime Epot(r,W) 166.685 μs (5 allocations: 464 bytes) 2.0618823575716226e6
Since V(x) is already defined to be @inline, my question is why is it that this property (=inlining) is not propagated through my type mine()? Is there a way to keep the function definition in mine(), and still make the compiler understand that the function call in glob.Pot inside function Epot must be inlined? I ask because I see that all the allocations go there, and I get a performance penalty that would like to avoid if possible.
On another (performance) note, please notice that in my function I copy all the coordinates r[:i] of the ith-particle with a for loop. My first approach was to simply replace lines like
for id in 1:3 ri[id] = r[i,id] end
ri .= r[1,:]
but that also increases the number of allocations, in the later case, from 5 to
@btime Epot(r,W) 170.148 μs (68 allocations: 7.34 KiB) 2.0618823575716226e6
and even worse if I do the same with the rij variable in the j loop.
Why is this happening? Is there a way to use this simple, yet nice vectorial notation, without increasing much the number the allocations?
Best regards and thanks,