Thank jeffreyesun, yha, stevengj for the helpful answers. This thing is now insanely fast!
One thing I have learned here is that Tuple (can come from elements of iterator / CartesianIndex) is fast! And reassigning to a new tuple is better than changing the mutable array when used in retrieving elements of array. Thx yha for tip on changing the immutable!
Also, there is no problem in tuple + splatting. (Now it seems itβs a matter of style?)
function test3(latt1)
N = size(latt1,1)
mvup(x::CartesianIndex, d) = CartesianIndex(ntuple(i -> i==d ? mod1(x[i]+1,N) : x[i], length(x)))
obs = 0.0
for d = 1:3, x in CartesianIndex(1, 1, 1):CartesianIndex(N,N,N)
obs += latt1[x, d] * latt1[mvup(x, d), d]
end
obs/N^2
end
function test3b(latt1)
N = size(latt1,1)
mvup(x::CartesianIndex, d) = CartesianIndex(ntuple(i -> i==d ? mod1(x[i]+1,N) : x[i], length(x)))
function mvup2(x::CartesianIndex, d)
function _f(i1::Int)
if i1 != d
return x[i1]
elseif i1 == d && x[i1] != N
return x[i1] + 1
else
return 1
end
end
return CartesianIndex(ntuple(_f, 3))
end
obs = 0.0
for d = 1:3, x in CartesianIndex(1, 1, 1):CartesianIndex(N,N,N)
obs += latt1[x, d] * latt1[mvup2(x, d), d]
end
obs/N^2
end
function test4(latt1)
N = size(latt1,1)
links = Iterators.product(1:N, 1:N, 1:N, 1:3)
function mvup(x, d)
function _f(i1::Int)
if i1 != d
return x[i1]
elseif i1 == d && x[i1] != N
return x[i1] + 1
else
return 1
end
end
return ntuple(_f, 3)
end
obs = 0.0
for link in links
x = link[1:3]
d = link[4]
obs += latt1[link...] * latt1[mvup(x,d)..., d]
end
obs/N^2
end
and I get
julia> @benchmark test3(latt1) setup=(latt1=rand(250,250,250,3))
BenchmarkTools.Trial: 5 samples with 1 evaluation.
Range (min β¦ max): 705.676 ms β¦ 766.002 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 732.916 ms β GC (median): 0.00%
Time (mean Β± Ο): 735.877 ms Β± 22.589 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
β β β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
706 ms Histogram: frequency by time 766 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark test3b(latt1) setup=(latt1=rand(250,250,250,3))
BenchmarkTools.Trial: 8 samples with 1 evaluation.
Range (min β¦ max): 259.175 ms β¦ 427.390 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 322.829 ms β GC (median): 0.00%
Time (mean Β± Ο): 321.440 ms Β± 52.464 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
β β β ββ ββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
259 ms Histogram: frequency by time 427 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
julia> @benchmark test4(latt1) setup=(latt1=rand(250,250,250,3))
BenchmarkTools.Trial: 8 samples with 1 evaluation.
Range (min β¦ max): 356.814 ms β¦ 423.293 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 363.248 ms β GC (median): 0.00%
Time (mean Β± Ο): 374.044 ms Β± 22.829 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
β β ββ β β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
357 ms Histogram: frequency by time 423 ms <
Memory estimate: 0 bytes, allocs estimate: 0.
One small thing I found is writing out (stupidly) the mvup / mvdown function seems to be slightly faster than mod1. Will look at the βghost gridβ stevengj suggested.
Thank you all again for the amazing comments. This is one of the few times I see my Julia code beating my fortran code :D.