Hi all, I am new to this forum. Nice to meet you.

Today I encouted a strange performance problem.

I benchmarked below codes with Benchmarktools.

```
NG = 2
N = NX = NY = NZ = 256
U = rand!(zeros(N + 2NG, N + 2NG, N + 2NG));
V = rand!(zeros(N + 2NG, N + 2NG, N + 2NG));
W = rand!(zeros(N + 2NG, N + 2NG, N + 2NG));
Jx = zeros(N + 2NG, N + 2NG, N + 2NG);
Jy = zeros(N + 2NG, N + 2NG, N + 2NG);
Jz = zeros(N + 2NG, N + 2NG, N + 2NG);
function plain!(U, V, W, Jx, Jy, Jz, NX, NY, NZ, NG)
for k in NG:NZ+1+NG, j in NG:NY+1+NG, i in NG:NX+1+NG
upp = -(U[i-1, j, k] + U[i, j, k]) / 2
vpp = -(V[i, j, k] + V[i+1, j, k]) / 2
wpp = -(W[i, j, k] + W[i+1, j, k]) / 2
ux1 = (U[i-1, j, k] + U[i, j, k]) / 2
uy1 = (U[i, j, k] + U[i, j+1, k]) / 2
uz1 = (U[i, j, k] + U[i, j, k+1]) / 2
Jx[i, j, k] = upp * ux1
Jy[i, j, k] = vpp * uy1
Jz[i, j, k] = wpp * uz1
end
end
@benchmark plain!($U, $V, $W, $Jx, $Jy, $Jz, $NX, $NY, $NZ, $NG)
function no_tmp!(U, V, W, Jx, Jy, Jz, NX, NY, NZ, NG)
for k in NG:NZ+1+NG, j in NG:NY+1+NG, i in NG:NX+1+NG
Jx[i, j, k] = -(U[i-1, j, k] + U[i, j, k]) / 2 * (U[i-1, j, k] + U[i, j, k]) / 2
Jy[i, j, k] = -(V[i, j, k] + V[i+1, j, k]) / 2 * (U[i, j, k] + U[i, j+1, k]) / 2
Jz[i, j, k] = -(W[i, j, k] + W[i+1, j, k]) / 2 * (U[i, j, k] + U[i, j, k+1]) / 2
end
end
@benchmark no_tmp!($U, $V, $W, $Jx, $Jy, $Jz, $NX, $NY, $NZ, $NG)
```

As you can see the difference between two functions is that tmp vairables is assigned or not.

BenchmarkTool said to me the former is 3x faster than the latter.

Why?

Julia Version 1.10.3

Commit 0b4590a5507 (2024-04-30 10:59 UTC)

Build Info:

Official https://julialang.org/ release

Platform Info:

OS: Linux (x86_64-linux-gnu)

CPU: 24 Γ 13th Gen Intel(R) Coreβ’ i7-13700K

WORD_SIZE: 64

LIBM: libopenlibm

LLVM: libLLVM-15.0.7 (ORCJIT, goldmont)

Threads: 24 default, 0 interactive, 12 GC (on 24 virtual cores)

Environment:

JULIA_EDITOR = code