# Question on Julia Blog Post regarding Performance

I was searching around for Julia comparisons to C++ and found this write-up from 2019.

I follow most of it but one point I was curious about is near the end where they implemented `@simd`, `@fastmath`, and `@inbounds`.

I would have expected to see `@inbounds` decorating the hot inner loop (I think that is line `for Pj in particles` unless I missed it). Instead they only have `@simd` decorating it and then wrap the actual loop calculations in a `begin` block which is decorated with `@fastmath @inbounds`.

It has never occurred to me to wrap such a large chunk of code in a `begin` block (I only find myself using them in pluto notebooks) so it might not be unusual at all but just new to me.

Why do that instead of putting it on the `for` loop? In fact, why not do `@simd @fastmath @inbounds for Pj in particles`? Is there a difference and if so what is it?

The entire code snippet in question, copy/pasted from the above link, is shown below for reference.

``````"""
Implementation of @simd, @fastmath, and @inbounds
"""
function P2P_FINAL(particles, g_dgdr, U, J)

for (i, Pi) in enumerate(particles)
@simd for Pj in particles

@fastmath @inbounds begin
dX1 = Pi.X1 - Pj.X1
dX2 = Pi.X2 - Pj.X2
dX3 = Pi.X3 - Pj.X3
r = sqrt(dX1*dX1 + dX2*dX2 + dX3*dX3)

if r != 0

# g_σ and ∂gσ∂r
gsgm, dgsgmdr = g_dgdr(r / Pj.sigma)

# K × Γp
crss1 = -const4 / r^3 * ( dX2*Pj.Gamma3 - dX3*Pj.Gamma2 )
crss2 = -const4 / r^3 * ( dX3*Pj.Gamma1 - dX1*Pj.Gamma3 )
crss3 = -const4 / r^3 * ( dX1*Pj.Gamma2 - dX2*Pj.Gamma1 )

# U = ∑g_σ(x-xp) * K(x-xp) × Γp
U[1, i] += gsgm * crss1
U[2, i] += gsgm * crss2
U[3, i] += gsgm * crss3

# ∂u∂xj(x) = ∑[ ∂gσ∂xj(x−xp) * K(x−xp)×Γp + gσ(x−xp) * ∂K∂xj(x−xp)×Γp ]
# ∂u∂xj(x) += ∑p[(Δxj∂gσ∂r/(σr) − 3Δxjgσ/r^2) K(Δx)×Γp
aux = dgsgmdr/(Pj.sigma*r)* - 3*gsgm /r^2
# j=1
J[1, 1, i] += aux * crss1 * dX1
J[2, 1, i] += aux * crss2 * dX1
J[3, 1, i] += aux * crss3 * dX1
# j=2
J[1, 2, i] += aux * crss1 * dX2
J[2, 2, i] += aux * crss2 * dX2
J[3, 2, i] += aux * crss3 * dX2
# j=3
J[1, 3, i] += aux * crss1 * dX3
J[2, 3, i] += aux * crss2 * dX3
J[3, 3, i] += aux * crss3 * dX3

# Adds the Kronecker delta term
aux = -const4 * gsgm / r^3
# j=1
J[2, 1, i] -= aux * Pj.Gamma3
J[3, 1, i] += aux * Pj.Gamma2
# j=2
J[1, 2, i] += aux * Pj.Gamma3
J[3, 2, i] -= aux * Pj.Gamma1
# j=3
J[1, 3, i] -= aux * Pj.Gamma2
J[2, 3, i] += aux * Pj.Gamma1

end
end

end
end

end
``````
1 Like

No, it wouldn’t make a difference. Also, I doubt that `@simd` annotation is doing anything here, due to the access pattern in `J`, and the branching on `r`.