I was searching around for Julia comparisons to C++ and found this write-up from 2019.
I follow most of it but one point I was curious about is near the end where they implemented @simd
, @fastmath
, and @inbounds
.
I would have expected to see @inbounds
decorating the hot inner loop (I think that is line for Pj in particles
unless I missed it). Instead they only have @simd
decorating it and then wrap the actual loop calculations in a begin
block which is decorated with @fastmath @inbounds
.
It has never occurred to me to wrap such a large chunk of code in a begin
block (I only find myself using them in pluto notebooks) so it might not be unusual at all but just new to me.
Why do that instead of putting it on the for
loop? In fact, why not do @simd @fastmath @inbounds for Pj in particles
? Is there a difference and if so what is it?
The entire code snippet in question, copy/pasted from the above link, is shown below for reference.
"""
Implementation of @simd, @fastmath, and @inbounds
"""
function P2P_FINAL(particles, g_dgdr, U, J)
for (i, Pi) in enumerate(particles)
@simd for Pj in particles
@fastmath @inbounds begin
dX1 = Pi.X1 - Pj.X1
dX2 = Pi.X2 - Pj.X2
dX3 = Pi.X3 - Pj.X3
r = sqrt(dX1*dX1 + dX2*dX2 + dX3*dX3)
if r != 0
# g_σ and ∂gσ∂r
gsgm, dgsgmdr = g_dgdr(r / Pj.sigma)
# K × Γp
crss1 = -const4 / r^3 * ( dX2*Pj.Gamma3 - dX3*Pj.Gamma2 )
crss2 = -const4 / r^3 * ( dX3*Pj.Gamma1 - dX1*Pj.Gamma3 )
crss3 = -const4 / r^3 * ( dX1*Pj.Gamma2 - dX2*Pj.Gamma1 )
# U = ∑g_σ(x-xp) * K(x-xp) × Γp
U[1, i] += gsgm * crss1
U[2, i] += gsgm * crss2
U[3, i] += gsgm * crss3
# ∂u∂xj(x) = ∑[ ∂gσ∂xj(x−xp) * K(x−xp)×Γp + gσ(x−xp) * ∂K∂xj(x−xp)×Γp ]
# ∂u∂xj(x) += ∑p[(Δxj∂gσ∂r/(σr) − 3Δxjgσ/r^2) K(Δx)×Γp
aux = dgsgmdr/(Pj.sigma*r)* - 3*gsgm /r^2
# j=1
J[1, 1, i] += aux * crss1 * dX1
J[2, 1, i] += aux * crss2 * dX1
J[3, 1, i] += aux * crss3 * dX1
# j=2
J[1, 2, i] += aux * crss1 * dX2
J[2, 2, i] += aux * crss2 * dX2
J[3, 2, i] += aux * crss3 * dX2
# j=3
J[1, 3, i] += aux * crss1 * dX3
J[2, 3, i] += aux * crss2 * dX3
J[3, 3, i] += aux * crss3 * dX3
# Adds the Kronecker delta term
aux = -const4 * gsgm / r^3
# j=1
J[2, 1, i] -= aux * Pj.Gamma3
J[3, 1, i] += aux * Pj.Gamma2
# j=2
J[1, 2, i] += aux * Pj.Gamma3
J[3, 2, i] -= aux * Pj.Gamma1
# j=3
J[1, 3, i] -= aux * Pj.Gamma2
J[2, 3, i] += aux * Pj.Gamma1
end
end
end
end
end