Hi, I have got 4 arrays a,b,c,d
each of size nvar X nx
where nvar
is a small Int64
, but nx
is a pretty large Int64
. I wish to iteratively act a function over the arguments (a[:,i],a[:,i-1],a[:,i+1],b[:,i],b[:,i-1],b[:,i+1],c[:,i],c[:,i-1],c[:,i+1])
for i=2:nx-1
and store the result to d
.
The function cannot take the full arrays a,b,c,d
as input and run the loops, there’s many versions of this function that the user gets to choose from, so repeating the loop in the various functions would be repetitive. Here’s a mini non-working example done with views
for i=1:nx
@views random_math!(a[:,i-1],a[:,i],a[:,i+1],b[:,i-1],b[:,i],b[:,i+1],
c[:,i-1], c[:,i],c[:,i+1],d[:,i])
end
Here’s the minimal working examples of the versions I tried. The first one is with views
using BenchmarkTools
using StaticArrays
function random_math_viewed(d,a,al,ar,b,bl,br,c,cl,cr,nvar)
for n=1:nvar
d[n] = 0.5*(al[n]+bl[n]) + 129.0*(c[n]*br[n]) + 300.0*(al[n]*cr[n])
end
end
function func_viewed(a,b,c,d,nx,nvar)
for i=2:nx-1
a0, al, ar = @views a[:,i], a[:,i-1], a[:,i+1]
b0, bl, br = @views b[:,i], b[:,i-1], b[:,i+1]
c0, cl, cr = @views c[:,i], c[:,i-1], c[:,i+1]
d0 = @views d[:,i]
random_math_viewed(d0,a0,al,ar,b0,bl,br,c0,cl,cr,nvar)
end
end
nx = 1000000
nvar = 4
a,b,c,d = [rand(nvar,nx) for _=1:4]
@btime func_viewed($a,$b,$c,$d,$nx,$nvar)
I get the result
43.161 ms (0 allocations: 0 bytes)
However, if I actually supply the index i
, I get 2.6 times the speed-up.
using BenchmarkTools
using StaticArrays
function random_math_indexed(d,a0,b0,c0, i, nvar)
for n=1:nvar
d[n,i] = 0.5*(a0[n,i]+b0[n,i-1]) + 129.0*(c0[n,i]*b0[n,i+1]) + 300.0*(a0[n,i-1]*c0[n,i+1])
end
end
function func_indexed(a, b, c, d, nx, nvar)
for i=2:nx-1
random_math_indexed(d, a, b, c, i, nvar)
end
end
nx = 1000000
nvar = 4
a,b,c,d = [rand(nvar,nx) for _=1:4]
@btime func_indexed($a,$b,$c,$d,$nx,$nvar)
The exact result is
16.388 ms (0 allocations: 0 bytes)
While this works, I wish to know if there’s a better way of doing it. Supplying the index in my case is very non-standard.
Removing @views
leads to poor performance as well (by a factor of 60).
I tried to imitate the approach Trixi.jl has taken here, some explanation here. I am clearly not understanding it, because I get very bad results. Here’s the code of my attempt to imitate Trixi.jl
using BenchmarkTools
using StaticArrays
function get_node_vars(u, i, nvar)
SVector(ntuple(@inline(v -> u[v,i]), Val(nvar)))
end
function set_node_vars!(u, u_node, i, nvar)
for n=1:nvar
u[n,i] = u_node[n]
end
end
function func_set_var(a, b, c, d, nx, nvar)
for i=2:nx-1
a0, al, ar = ( get_node_vars(a, i, nvar), get_node_vars(a, i-1, nvar),
get_node_vars(a, i+1, nvar) )
b0, bl, br = ( get_node_vars(b, i, nvar), get_node_vars(b, i-1, nvar),
get_node_vars(b, i+1, nvar) )
c0, cl, cr = ( get_node_vars(c, i, nvar), get_node_vars(c, i-1, nvar),
get_node_vars(c, i+1, nvar) )
d0 = random_math_returning(a,al,ar,b,bl,br,c,cl,cr,nvar)
set_node_vars!(d, d0, i, nvar)
end
end
nx = 1000000
nvar = 4
a,b,c,d = [rand(nvar,nx) for _=1:4]
@btime func_set_var($a, $b, $c, $d, $nx, $nvar)