Using views causes significant slowdown in simple PDE

bgroenks · February 7, 2021, 4:10pm

I have stumbled upon a frustrating performance issue with either DifferentialEquations.jl or the Julia compiler (or maybe my code?). The motivating use-case is the application of the 1-dimensional heat equation (and possibly additional dynamics) to a heterogeneous surface that must be broken up into (somewhat) homogeneous pieces for discretization. Thus, we have a large vector representing the energy state of the entire surface, but we need to call different forms of the heat equation for different chunks. For the purposes of this example, we’ll just call the same function for each chunk.

The problem is that, when using views at both levels (i.e. the chunks as well as for computing second-order finite difference), there is a >800x performance hit, apparently from allocations. This is not reproducible when benchmarking the functions standalone. The ODE solve is fast when using slices for the chunks at the top level, even though this should, in theory, involve more allocation.

I’ve talked to Chris R. on Slack and he thinks it might be some weird compiler issue. So if anyone here is more experienced with the nuances of Julia’s compiler, I would really appreciate your help!

I have prepared a simple MWE to demonstrate the problem. See below. This might be clearer than me blabbering anyway!

using DifferentialEquations, OrdinaryDiffEq, DiffEqOperators
using ComponentArrays
using BenchmarkTools

const nknots = 100
const h = 1.0/(nknots+1)
const t0,t1 = (0.0,1.0)
u0 = randn(300)
u0 = ComponentArray(a=u0[1:100],b=u0[101:200],c=u0[201:300])

@inline function heat_conduction(du,u,p,t)
  du .= zero(eltype(u))
  u₃ = @view u[3:end]
  u₂ = @view u[2:end-1]
  u₁ = @view u[1:end-2]
  @. du[2:end-1] = ((u₃ - 2*u₂ + u₁)/h^2.0)
  nothing
end

function func_linidx_with_slices(du,u,p,t)
    du1, u1 = du[1:100], u[1:100]
    du2, u2 = du[101:200], u[101:200]
    du3, u3 = du[201:300], u[201:300]
    heat_conduction(du1,u1,p,t)
    heat_conduction(du2,u2,p,t)
    heat_conduction(du3,u3,p,t)
end

function func_linidx_with_views(du,u,p,t)
    du1, u1 = (@view du[1:100]),@view u[1:100]
    du2, u2 = (@view du[101:200]),@view u[101:200]
    du3, u3 = (@view du[201:300]),@view u[201:300]
    heat_conduction(du1,u1,p,t)
    heat_conduction(du2,u2,p,t)
    heat_conduction(du3,u3,p,t)
end

prob = ODEProblem(func_linidx_with_slices, u0, (t0, t1))
sol = @btime solve(prob, Tsit5(), saveat=0.2)
# 70.031 μs (271 allocations: 234.27 KiB)
prob = ODEProblem(func_linidx_with_views, u0, (t0, t1))
sol = @btime solve(prob, Tsit5(), saveat=0.2)
# 83.342 ms (11714 allocations: 242.91 KiB)
du0 = similar(u0)
du0 .= 0.0
@benchmark func_linidx_with_slices($du0,$u0,nothing,0.0)
# BenchmarkTools.Trial:
# memory estimate:  5.25 KiB
# allocs estimate:  6
# --------------
# minimum time:     900.111 ns (0.00% GC)
# median time:      1.019 μs (0.00% GC)
# mean time:        1.357 μs (15.32% GC)
# maximum time:     72.786 μs (95.15% GC)
# --------------
# samples:          10000
# evals/sample:     36
@benchmark func_linidx_with_views($du0,$u0,nothing,0.0)
# BenchmarkTools.Trial: 
#   memory estimate:  0 bytes
#   allocs estimate:  0
#   --------------
#   minimum time:     374.263 ns (0.00% GC)
#   median time:      375.234 ns (0.00% GC)
#   mean time:        393.908 ns (0.00% GC)
#   maximum time:     1.268 μs (0.00% GC)
#   --------------
#   samples:          10000
#   evals/sample:     205

The main thing to note here is the apparent difference between benchmarking the ODE solve vs the standalone functions. In the latter case, the function using views is more efficient, as expected.

ranocha · February 7, 2021, 4:35pm

Your func_linidx_with_slices does not modify du itself, does it? It modifies only the slices, which are copies of some parts. Hence, the ODE algorithm just sees zero change and can step fast without needing to do real work.
Caveat: I’m on my phone and can’t check that right now. This is just a guess based on the code.

bgroenks · February 7, 2021, 4:39pm

Wow, I’m dumb. You’re totally right, that’s gotta be it. Thanks!

Topic		Replies	Views
Reduce views for a function that is applied piece-wise over an array Performance question , diffeq	6	770	February 4, 2018
DifferentialEquations.solve() speed New to Julia diffeq	3	469	May 17, 2019
Speeding up solution to the large system of non-linear, complex-valued ODEs General Usage diffeq , first-steps , performance , optimization	6	1145	May 27, 2020
Preconditioners quite slow in PDE system with DifferentialEquations.jl Modelling & Simulations package , diffeq , pde	6	748	December 4, 2022
Parellelization for large ODEs, Multithreading fails Performance performance , multithreading , differentialequation	16	309	March 30, 2023

Using views causes significant slowdown in simple PDE

Related topics