Faster alternate to @views for passing subarrays to functions

Hi, I have got 4 arrays a,b,c,d each of size nvar X nx where nvar is a small Int64, but nx is a pretty large Int64. I wish to iteratively act a function over the arguments (a[:,i],a[:,i-1],a[:,i+1],b[:,i],b[:,i-1],b[:,i+1],c[:,i],c[:,i-1],c[:,i+1]) for i=2:nx-1 and store the result to d.

The function cannot take the full arrays a,b,c,d as input and run the loops, there’s many versions of this function that the user gets to choose from, so repeating the loop in the various functions would be repetitive. Here’s a mini non-working example done with views

for i=1:nx
  @views random_math!(a[:,i-1],a[:,i],a[:,i+1],b[:,i-1],b[:,i],b[:,i+1],
                      c[:,i-1], c[:,i],c[:,i+1],d[:,i])
end

Here’s the minimal working examples of the versions I tried. The first one is with views

using BenchmarkTools
using StaticArrays

function random_math_viewed(d,a,al,ar,b,bl,br,c,cl,cr,nvar)
   for n=1:nvar
      d[n] = 0.5*(al[n]+bl[n]) + 129.0*(c[n]*br[n]) + 300.0*(al[n]*cr[n])
   end
end

function func_viewed(a,b,c,d,nx,nvar)
   for i=2:nx-1
      a0, al, ar = @views a[:,i], a[:,i-1], a[:,i+1]
      b0, bl, br  = @views b[:,i], b[:,i-1], b[:,i+1]
      c0, cl, cr  = @views c[:,i], c[:,i-1], c[:,i+1]
      d0  = @views d[:,i]
      random_math_viewed(d0,a0,al,ar,b0,bl,br,c0,cl,cr,nvar)
   end
end

nx = 1000000
nvar = 4

a,b,c,d = [rand(nvar,nx) for _=1:4]
@btime func_viewed($a,$b,$c,$d,$nx,$nvar)

I get the result

43.161 ms (0 allocations: 0 bytes)

However, if I actually supply the index i, I get 2.6 times the speed-up.

using BenchmarkTools
using StaticArrays

function random_math_indexed(d,a0,b0,c0, i, nvar)
   for n=1:nvar
      d[n,i] = 0.5*(a0[n,i]+b0[n,i-1]) + 129.0*(c0[n,i]*b0[n,i+1]) + 300.0*(a0[n,i-1]*c0[n,i+1])
   end
end

function func_indexed(a, b, c, d, nx, nvar)
   for i=2:nx-1
      random_math_indexed(d, a, b, c, i, nvar)
   end
end

nx = 1000000
nvar = 4

a,b,c,d = [rand(nvar,nx) for _=1:4]

@btime func_indexed($a,$b,$c,$d,$nx,$nvar)

The exact result is

16.388 ms (0 allocations: 0 bytes)

While this works, I wish to know if there’s a better way of doing it. Supplying the index in my case is very non-standard.

Removing @views leads to poor performance as well (by a factor of 60).

I tried to imitate the approach Trixi.jl has taken here, some explanation here. I am clearly not understanding it, because I get very bad results. Here’s the code of my attempt to imitate Trixi.jl

using BenchmarkTools
using StaticArrays

function get_node_vars(u, i, nvar)
   SVector(ntuple(@inline(v -> u[v,i]), Val(nvar)))
end

function set_node_vars!(u, u_node, i, nvar)
   for n=1:nvar
      u[n,i] = u_node[n]
   end
end

function func_set_var(a, b, c, d, nx, nvar)
   for i=2:nx-1
         a0, al, ar = ( get_node_vars(a, i, nvar), get_node_vars(a, i-1, nvar),
                        get_node_vars(a, i+1, nvar) )
         b0, bl, br = ( get_node_vars(b, i, nvar), get_node_vars(b, i-1, nvar),
                        get_node_vars(b, i+1, nvar) )
         c0, cl, cr = ( get_node_vars(c, i, nvar), get_node_vars(c, i-1, nvar),
                        get_node_vars(c, i+1, nvar) )
         d0 = random_math_returning(a,al,ar,b,bl,br,c,cl,cr,nvar)
         set_node_vars!(d, d0, i, nvar)
   end
end

nx = 1000000
nvar = 4

a,b,c,d = [rand(nvar,nx) for _=1:4]

@btime func_set_var($a, $b, $c, $d, $nx, $nvar)

What happens if you explicitly @inbounds the accesses in your random_math!? It could be that Julia is able to automatically remove the bounds checks for the direct accesses but not for the SubArray.

only improves about 20%.

FWIW, the second snippet is both faster and looks more clean (shorter) to me.

Adding @inline function random_math_viewed makes this path faster, for me.

2 Likes