Seemingly unnecessary allocation within for loop

stevengj · December 4, 2021, 2:55pm

For fun, I wrote out an implementation of allocation-free explicit timestepping for an N-dimensional scalar wave equation, using ghost-cell boundary conditions and centered-difference approximations (equivalent to a leapfrog scheme):

"""
    u, uprev, unew = wavepropagate!(u, cΔtΔx⁻¹=0.9/√N, uprev=copy(u), unew=copy(u))

Time-evolve `u` according to `timesteps` timesteps of the equation ∂²u/∂t² = c²∇²u,
using centered-difference approximations in space and time with discretization
size Δx and Δt, respectively, given the ratio cΔtΔx⁻¹(which must be < 1/√N
for CFL stability).

Dirichlet boundary conditions are used, based on the initial boundary values in `u`.
The `uprev` array should contain `u` at the previous timestep, defaulting to `copy(u)`
(corresponding to initial conditions at rest).   The `unew` array is used for storing
the subsequent timestep and should be of the same size and type as `u`.
"""
function wavepropagate!(u::AbstractArray{T, N}, timesteps::Integer,
                       cΔtΔx⁻¹::Real=0.9/√N,
                       uprev::AbstractArray{T, N}=copy(u),
                       unew::AbstractArray{T, N}=copy(u)) where {T<:Number,N}
    cΔtΔx⁻¹ < 1/√N || throw(ArgumentError("cΔtΔx⁻¹ = $cΔtΔx⁻¹ violates CFL stability condition"))
    c²Δt²Δx⁻² = (cΔtΔx⁻¹)^2
    
    # construct tuple of unit vectors along N directions, for constructing nearest neighbors
    # ... using ntuple with Val(N) makes this type-stable with hard-coded loops
    unitvecs = ntuple(i -> CartesianIndex(ntuple(==(i), Val(N))), Val(N))
    
    I = CartesianIndices(u)
    Ifirst, Ilast = first(I), last(I)
    I1 = oneunit(Ifirst)
    
    for t = 1:timesteps
    
        # use the "ghost cell" technique for boundary conditions:
        # only timestep in the interior, and update boundaries separately (if at all)
        @inbounds for i in Ifirst+I1:Ilast-I1
            # compute a discrete (center-difference) Laplacian of u
            ∇²u = -2N * u[i]
            for uvec in unitvecs
                ∇²u += u[i+uvec]+u[i-uvec]
            end
            
            # update u via center-difference approximation of ∂²u/∂t² = c²∇²u
            unew[i] = 2u[i] - uprev[i] + c²Δt²Δx⁻² * ∇²u
        end
        
        # here, you would update the boundary "pixels" of xnew based on whatever
        # boundary condition you want.   not updating them, as we do here,
        # corresponds to Dirichlet boundary conditions
        
        u, uprev, unew = unew, u, uprev # cycle the arrays
    end
    
    return u, uprev, unew
end

For example, here is the output in 1d, where you can see an initial “pulse” propagating outwards:

using PyPlot
x = range(-1, 1, length=200)
u = @. exp(-(x/0.1)^2)
plot(x, u)
uprev, unew = copy(u), copy(u)
for i = 1:4
    u, uprev, unew = wavepropagate!(u, 40, sqrt(0.2), uprev, unew)
    plot(x, u)
end
xlabel(L"x")
ylabel(L"u(x,t)")
legend([L"t = %$n \Delta t" for n = (0:5)*40])

Topic		Replies	Views
Undesired memory allocation during interpolation Performance memory-allocation	33	1378	December 8, 2021
Poor performance due to memory allocations? Performance memory-allocation	17	2855	January 15, 2019
Trying to speed-up Laplace Equation Solver Performance	36	1759	August 1, 2020
Common allocation mistakes Performance memory-allocation	47	7158	August 21, 2023
Where is this one allocation coming from? Performance memory-allocation	16	834	January 20, 2022

Seemingly unnecessary allocation within for loop

Related topics