Faster alternate to @views for passing subarrays to functions

Arpit_Babbar · January 7, 2022, 2:43pm

Hi, I have got 4 arrays a,b,c,d each of size nvar X nx where nvar is a small Int64, but nx is a pretty large Int64. I wish to iteratively act a function over the arguments (a[:,i],a[:,i-1],a[:,i+1],b[:,i],b[:,i-1],b[:,i+1],c[:,i],c[:,i-1],c[:,i+1]) for i=2:nx-1 and store the result to d.

The function cannot take the full arrays a,b,c,d as input and run the loops, there’s many versions of this function that the user gets to choose from, so repeating the loop in the various functions would be repetitive. Here’s a mini non-working example done with views

for i=1:nx
  @views random_math!(a[:,i-1],a[:,i],a[:,i+1],b[:,i-1],b[:,i],b[:,i+1],
                      c[:,i-1], c[:,i],c[:,i+1],d[:,i])
end

Here’s the minimal working examples of the versions I tried. The first one is with views

using BenchmarkTools
using StaticArrays

function random_math_viewed(d,a,al,ar,b,bl,br,c,cl,cr,nvar)
   for n=1:nvar
      d[n] = 0.5*(al[n]+bl[n]) + 129.0*(c[n]*br[n]) + 300.0*(al[n]*cr[n])
   end
end

function func_viewed(a,b,c,d,nx,nvar)
   for i=2:nx-1
      a0, al, ar = @views a[:,i], a[:,i-1], a[:,i+1]
      b0, bl, br  = @views b[:,i], b[:,i-1], b[:,i+1]
      c0, cl, cr  = @views c[:,i], c[:,i-1], c[:,i+1]
      d0  = @views d[:,i]
      random_math_viewed(d0,a0,al,ar,b0,bl,br,c0,cl,cr,nvar)
   end
end

nx = 1000000
nvar = 4

a,b,c,d = [rand(nvar,nx) for _=1:4]
@btime func_viewed($a,$b,$c,$d,$nx,$nvar)

I get the result

43.161 ms (0 allocations: 0 bytes)

However, if I actually supply the index i, I get 2.6 times the speed-up.

using BenchmarkTools
using StaticArrays

function random_math_indexed(d,a0,b0,c0, i, nvar)
   for n=1:nvar
      d[n,i] = 0.5*(a0[n,i]+b0[n,i-1]) + 129.0*(c0[n,i]*b0[n,i+1]) + 300.0*(a0[n,i-1]*c0[n,i+1])
   end
end

function func_indexed(a, b, c, d, nx, nvar)
   for i=2:nx-1
      random_math_indexed(d, a, b, c, i, nvar)
   end
end

nx = 1000000
nvar = 4

a,b,c,d = [rand(nvar,nx) for _=1:4]

@btime func_indexed($a,$b,$c,$d,$nx,$nvar)

The exact result is

16.388 ms (0 allocations: 0 bytes)

While this works, I wish to know if there’s a better way of doing it. Supplying the index in my case is very non-standard.

Removing @views leads to poor performance as well (by a factor of 60).

I tried to imitate the approach Trixi.jl has taken here, some explanation here. I am clearly not understanding it, because I get very bad results. Here’s the code of my attempt to imitate Trixi.jl

using BenchmarkTools
using StaticArrays

function get_node_vars(u, i, nvar)
   SVector(ntuple(@inline(v -> u[v,i]), Val(nvar)))
end

function set_node_vars!(u, u_node, i, nvar)
   for n=1:nvar
      u[n,i] = u_node[n]
   end
end

function func_set_var(a, b, c, d, nx, nvar)
   for i=2:nx-1
         a0, al, ar = ( get_node_vars(a, i, nvar), get_node_vars(a, i-1, nvar),
                        get_node_vars(a, i+1, nvar) )
         b0, bl, br = ( get_node_vars(b, i, nvar), get_node_vars(b, i-1, nvar),
                        get_node_vars(b, i+1, nvar) )
         c0, cl, cr = ( get_node_vars(c, i, nvar), get_node_vars(c, i-1, nvar),
                        get_node_vars(c, i+1, nvar) )
         d0 = random_math_returning(a,al,ar,b,bl,br,c,cl,cr,nvar)
         set_node_vars!(d, d0, i, nvar)
   end
end

nx = 1000000
nvar = 4

a,b,c,d = [rand(nvar,nx) for _=1:4]

@btime func_set_var($a, $b, $c, $d, $nx, $nvar)

mbauman · January 7, 2022, 3:59pm

What happens if you explicitly @inbounds the accesses in your random_math!? It could be that Julia is able to automatically remove the bounds checks for the direct accesses but not for the SubArray.

jling · January 7, 2022, 4:16pm

only improves about 20%.

FWIW, the second snippet is both faster and looks more clean (shorter) to me.

mcabbott · January 7, 2022, 4:39pm

Adding @inline function random_math_viewed makes this path faster, for me.

Topic		Replies	Views
Passing sub-array by reference / performance Performance	18	4728	April 10, 2019
Array views becoming dominant source of memory allocation Performance	31	5171	November 14, 2018
Should view(array, 1:length(array)) return a SubArray? Data array	11	1150	August 19, 2017
Is there a way to guarantee 0 allocations when accessing an array? General Usage	18	4307	June 1, 2017
Performance of views vs. explicit index referencing Performance	2	501	July 7, 2020

Faster alternate to @views for passing subarrays to functions

Related topics