Trouble using PencilArrays with DifferentialEquations

I’m working on a simulation that requires taking FFTs and IFFTs of pretty large systems (~1024x1024x2048 elements). Between transforms I need to advance all of the elements according to a differential equation. Because the arrays are so large I’m using PencilArrays.jl and PencilFFTs.jl to parallelize the transforms using mpi. The parallel/transform part of my code is running really well, but I’m stumbling on the ODE step.

I made a small demo code using normal arrays to test the ODE setup and it works fine.

#################### basic ODE test
A = Complex.([0.1,1.0])
dA = Complex.([0.0, 0.0])
p = Complex([1e-1])

function minimal_function(dA, A, p, t)
	
	for i = 1:length(A)
		dA[i] = p[1]*A[i]
	end

	return dA
end

# set up and solve ODE problem
tspan = (0.0,1.0)
f!(du, u, p, t) = minimal_function(du, u, p, t)
prob = ODEProblem{true}(f!, A, tspan, p)
sol = solve(prob)

However, when I tried adapting this to work with the pencil arrays I ran into trouble. The code runs just fine and gives a solution object sol but the entries in sol.u are all complex zeros. Any thoughts on what I’m doing wrong?

# Attempt using pencil arrays. One mpi node is sufficient to duplicate the problem
using MPI
MPI.Init()
using DifferentialEquations
using PencilArrays
using PencilFFTs

comm = MPI.COMM_WORLD  
mpi_size = MPI.Comm_size(comm) 
mpi_rank = MPI.Comm_rank(comm)

# set the box dimensions
Nx = 8
Ny = 8
Nz = 16

# divide the space up over the mpi workers
proc_dims = let pdims = zeros(Int, 2)
		MPI.Dims_create!(mpi_size, pdims)
		pdims[1], pdims[2]
	    end


# set the transform to in-place, complex-to-complex
transform = Transforms.FFT!()
plan = PencilFFTPlan((Nx,Ny,Nz), transform, proc_dims, comm)

# allocate memory for the pencil array that is compatible with the plan
A = allocate_input(plan)
dA = allocate_input(plan)

# local_R and local_dR are views into the pencil arrays and
# rngR annd rngdR are tuples of vectors of indices used to access the data.
local_R = global_view(first(A))
rngR = axes(local_R)

local_dR = global_view(first(dA))
rngdR = axes(local_dR)


# fill A with some random data
for i in rngR[1], j in rngR[2], k in rngR[3]
	local_R[i,j,k] = rand()
end


function minimal_test(local_dS, local_S, p, t)

	# apply a non-linearity
	for i in rngR[1], j in rngR[2], k in rngR[3]
		local_dR[i,j,k] = p[1]*local_R[i,j,k]^2
	end
	
	return local_dR
end


# set up and solve ODE problem
tspan = (0.0,1.0)
p = [0.25]
f!(du, u, p, t) = minimal_test(du, u, p, t)
prob = ODEProblem{true}(f!, local_R, tspan, p)
sol = solve(prob)

Hi, I’m not sure what the problem can be, but have you tried with out-of-place FFTs?

In that case, you would just need to replace Transforms.FFT!() → Transforms.FFT(), first(A) → A and first(dA) → dA.

Another thing you could try is to avoid using global_view, and instead to directly use the PencilArrays returned by allocate_input. If that’s the problem, then it’s a bug with global views, and it would be great if you could open an issue over at PencilArrays.jl.

Also, I’m not sure if it’s intentional or a typo, but in your minimal_test function, you’re not using the input arguments local_dS and local_S, but the globally defined arrays local_R and local_dR.

I tried to run you example, but it fails at sol = solve(prob) with the following error. I get a similar error if I don’t use global_view.

ERROR: MethodError: no method matching OrdinaryDiffEq.Vern9Cache(::OffsetArrays.OffsetArray{ComplexF64, 3, PencilArray{ComplexF64, 3, Array
{ComplexF64, 3}, 3, 0, Pencil{3, 2, NoPermutation}}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, PencilArray{ComplexF64, 3, Array{ComplexF64
, 3}, 3, 0, Pencil{3, 2, NoPermutation}}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{Comp
lexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3
, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, Array{C
omplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, Array{ComplexF64
, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::
OffsetArrays.OffsetArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArr
ays.OffsetArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.Offse
tArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{Co
mplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64,
 3, Array{ComplexF64, 3}}, ::OffsetArrays.OffsetArray{ComplexF64, 3, PencilArray{ComplexF64, 3, Array{ComplexF64, 3}, 3, 0, Pencil{3, 2, No
Permutation}}}, ::OrdinaryDiffEq.Vern9Tableau{Float64, Float64})                                                                           
Closest candidates are:                                                                                                                    
  OrdinaryDiffEq.Vern9Cache(::var"#1069#uType", ::var"#1069#uType", ::var"#1070#rateType", ::var"#1070#rateType", ::var"#1070#rateType", ::
var"#1070#rateType", ::var"#1070#rateType", ::var"#1070#rateType", ::var"#1070#rateType", ::var"#1070#rateType", ::var"#1070#rateType", ::v
ar"#1070#rateType", ::var"#1070#rateType", ::var"#1070#rateType", ::var"#1070#rateType", ::var"#1070#rateType", ::var"#1070#rateType", ::var"#1070#rateType", ::var"#1069#uType", ::var"#1069#uType", ::var"#1070#rateType", ::var"#1071#uNoUnitsType", ::var"#1072#TabType") where {var"#1069#uType", var"#1070#rateType", var"#1071#uNoUnitsType", var"#1072#TabType"} at /home/jpolanco/.julia/packages/OrdinaryDiffEq/zbCbk/src/caches/verner_caches.jl:126
1 Like

It looks like the array type is missing a dispatch of similar or zero. Can you give me a simplified code without MPI and such that reduces the error so I can run it locally?

2 Likes

@ChrisRackauckas Thanks a lot for the hint! The array type was indeed missing a definition of zero. I just tested locally and things seem to work as expected once zero is properly defined.

@PlasmaJoe I’ll soon push a new version of PencilArrays including the fix, let me know if it works for you.

1 Like

@jipolanco Awesome! I’m not an admin on our cluster so I won’t be able to update the package there and test the main problem for a couple of days, but I updated and ran the test case on my laptop and the output looks reasonable.

@ChrisRackauckas As far as I can tell the pencil arrays need an mpi comm to be defined in order to initialize so I never got a version without MPI to reproduce the issue. If it ever comes up, I’ve been able to run MPI on my laptop using just one process to debug MPI issues locally with pretty good success.

Thank you both for your help, I really appreciate it!

I have run into a problem moving to more than 1 worker. There was a detail I left out of the example for clarity that in hindsight I should have included. The non-linear effects have to be calculated in configuration space, but the terms being updated in the ODE are in spectral space so I have to take forwards and backwards transforms inside the function that is past to the ODE solver.

As far as I can tell, when sol.u is being created it copies all of the data from the global views, but is not itself a global view so the transforms fft_plan*A and fft_plan\A don’t have any effect. I tried side stepping the issue by explicitly copying the data back and forth between the global views and u which seems inefficient but should work.

This work-around functions perfectly on 1 worker, but suffers errors when I run it on more than one worker. Reading through the error message it looks to me like there is a mismatch in the indices that are being used in the global view and u, maybe when u is created it has different offset indices?

Below is an updated example. Ideally I’d like to avoid all of the copying that makes the workaround function. Any thoughts?

using MPI
MPI.Init()
using DifferentialEquations
using PencilArrays
using PencilFFTs

comm = MPI.COMM_WORLD
mpi_size = MPI.Comm_size(comm)
mpi_rank = MPI.Comm_rank(comm)

# set the box dimensions. (Small for illustration, realistic case would be 1024, 1024, 2048)
Nx, Ny, Nz = 64, 64, 64

# divide the space up over the mpi workers
proc_dims = let pdims = zeros(Int, 2)
		MPI.Dims_create!(mpi_size, pdims)
		pdims[1], pdims[2]
	    end

# set the transform to in-place, complex-to-complex
transform = Transforms.FFT!()
fft_plan = PencilFFTPlan((Nx,Ny,Nz), transform, proc_dims, comm)

# allocate memory for the pencil array that is compatible with the plan
A = allocate_input(fft_plan)
dA = allocate_input(fft_plan)

# Real-space A/dA
# local_R and local_dR are views into the pencil arrays and
# rngR annd rngdR are tuples of vectors of indices used to access the data.
local_R = global_view(first(A))
rngR = axes(local_R)

local_dR = global_view(first(dA))
rngdR = axes(local_dR)

#### spectral-space views
local_S = global_view(last(A))
rngS = axes(local_S)

local_dS = global_view(last(dA))
rngdS = axes(local_dS)

# set up coodinates to initialize the main pencil array in configuration space
x, y, z = range(-1,1,length = Nx), range(-1,1,length = Ny), range(-1,1,length = Nz)

# function to intialize the main pencil array
fi(x) = exp(-100*x^2)

# First function I tried. It runs, but there is no connection between u and the pencil
# arrays so the transforms don't effect the solution
function first_idea!(du, u, p, t)

	# go to configuration space with ifft. Print out the first element
	# of u before and after the ifft to see if there is any impact on u
	println(u[1,1,1])
	fft_plan\A
	println(u[1,1,1])

	# apply a non-linearity in configuration space
	for i in rngR[1], j in rngR[2], k in rngR[3]
		u[i,j,k] = p[1]*u[i,j,k]^2
	end

	# take the main array and the derivative array to spectral space
	fft_plan*A
	fft_plan*dA

	return du
end

# function defining the ode being solved. I need to advanced the ode in spectral space, but the nonlinearity
# is defined in configuration space nessessitating FFTs and iFFTs within the function.
function second_idea!(du, u, p, t)

	# copy information from u and du into the pencil Arrays via the global views
	for i in rngS[1], j in rngS[2], k in rngS[3]
		local_S[i,j,k] = u[i,j,k]
	end

	# go to configuration space with ifft
	plan\A

	# apply a non-linearity in configuration space
	for i in rngR[1], j in rngR[2], k in rngR[3]
		local_dR[i,j,k] = p[1]*local_R[i,j,k]^2
	end

	# take the main array and the derivative array to spectral space
	fft_plan*A
	fft_plan*dA

	# copy the data from the pencil arrays into u and du for the ode solver
	for i in rngS[1], j in rngS[2], k in rngS[3]
		u[i,j,k] = local_S[i,j,k]
	end

	for i in rngdS[1], j in rngdS[2], k in rngdS[3]
		du[i,j,k] = local_dS[i,j,k]
	end
	
	return du
end


# fill A with initial data
for i in rngR[1], j in rngR[2], k in rngR[3]
	local_R[i,j,k] = fi(x[i])*fi(y[j])*fi(z[k])
end

# in the real code the arrays are transformed before the ode step
fft_plan*A
fft_plan*dA

# set up an ODE problem
tspan = (0.0,1.0)
p = [0.1]

#switch first_idea!(du,u,p,t) and second_idea!(du,u,p,t) to see
# different behavior.
f!(du, u, p, t) = first_idea!(du, u, p, t)
prob = ODEProblem{true}(f!, local_S, tspan, p)
sol = solve(prob);

And below is the error I get when I try to run the work-around on more than one mpi worker.

ERROR: LoadError: BoundsError: attempt to access 64×64×32 LinearIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}} at index [CartesianIndex(1, 1, -31)]
Stacktrace:
  [1] throw_boundserror(A::LinearIndices{3, Tuple{Base.OneTo{Int64}, Base.OneTo{Int64}, Base.OneTo{Int64}}}, I::Tuple{CartesianIndex{3}})
    @ Base ./abstractarray.jl:651
  [2] checkbounds
    @ ./abstractarray.jl:616 [inlined]
  [3] getindex
    @ ~/.julia/packages/PencilArrays/VRsaA/src/cartesian_indices.jl:45 [inlined]
  [4] first
    @ ./abstractarray.jl:368 [inlined]
  [5] _mapreduce(f::typeof(DiffEqBase.UNITLESS_ABS2), op::typeof(Base.add_sum), #unused#::IndexLinear, A::OffsetArrays.OffsetArray{ComplexF64, 3, PencilArray{ComplexF64, 3, Array{ComplexF64, 3}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}}}})
    @ Base ./reduce.jl:415
  [6] _mapreduce_dim
    @ ./reducedim.jl:318 [inlined]
  [7] #mapreduce#672
    @ ./reducedim.jl:310 [inlined]
  [8] mapreduce
    @ ./reducedim.jl:310 [inlined]
  [9] #_sum#682
    @ ./reducedim.jl:878 [inlined]
 [10] _sum
    @ ./reducedim.jl:878 [inlined]
 [11] #sum#680
    @ ./reducedim.jl:874 [inlined]
 [12] sum
    @ ./reducedim.jl:874 [inlined]
 [13] UNITLESS_ABS2
    @ ~/.julia/packages/DiffEqBase/lULzQ/src/common_defaults.jl:2 [inlined]
 [14] ODE_DEFAULT_NORM
    @ ~/.julia/packages/DiffEqBase/lULzQ/src/common_defaults.jl:18 [inlined]
 [15] __init(prob::ODEProblem{OffsetArrays.OffsetArray{ComplexF64, 3, PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}}}}, Tuple{Float64, Float64}, true, Vector{Float64}, ODEFunction{true, typeof(f!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, alg::Tsit5, timeseries_init::Tuple{}, ts_init::Tuple{}, ks_init::Tuple{}, recompile::Type{Val{true}}; saveat::Tuple{}, tstops::Tuple{}, d_discontinuities::Tuple{}, save_idxs::Nothing, save_everystep::Bool, save_on::Bool, save_start::Bool, save_end::Nothing, callback::Nothing, dense::Bool, calck::Bool, dt::Float64, dtmin::Nothing, dtmax::Float64, force_dtmin::Bool, adaptive::Bool, gamma::Rational{Int64}, abstol::Nothing, reltol::Nothing, qmin::Rational{Int64}, qmax::Int64, qsteady_min::Int64, qsteady_max::Int64, beta1::Nothing, beta2::Nothing, qoldinit::Rational{Int64}, controller::Nothing, fullnormalize::Bool, failfactor::Int64, maxiters::Int64, internalnorm::typeof(DiffEqBase.ODE_DEFAULT_NORM), internalopnorm::typeof(LinearAlgebra.opnorm), isoutofdomain::typeof(DiffEqBase.ODE_DEFAULT_ISOUTOFDOMAIN), unstable_check::typeof(DiffEqBase.ODE_DEFAULT_UNSTABLE_CHECK), verbose::Bool, timeseries_errors::Bool, dense_errors::Bool, advance_to_tstop::Bool, stop_at_next_tstop::Bool, initialize_save::Bool, progress::Bool, progress_steps::Int64, progress_name::String, progress_message::typeof(DiffEqBase.ODE_DEFAULT_PROG_MESSAGE), userdata::Nothing, allow_extrapolation::Bool, initialize_integrator::Bool, alias_u0::Bool, alias_du0::Bool, initializealg::OrdinaryDiffEq.DefaultInit, kwargs::174.0429638279723 + 0.0im
Base.Iterators.Pairs{Symbol, Bool, Tuple{Symbol, Symbol}, NamedTuple{(:default_set, :second_time), Tuple{Bool, Bool}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/zbCbk/src/solve.jl:274
 [16] __solve(::ODEProblem{OffsetArrays.OffsetArray{ComplexF64, 3, PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}}}}, Tuple{Float64, Float64}, true, Vector{Float64}, ODEFunction{true, typeof(f!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, ::Tsit5; kwargs::Base.Iterators.Pairs{Symbol, Bool, Tuple{Symbol, Symbol}, NamedTuple{(:default_set, :second_time), Tuple{Bool, Bool}}})
    @ OrdinaryDiffEq ~/.julia/packages/OrdinaryDiffEq/zbCbk/src/solve.jl:4
 [17] __solve(::ODEProblem{OffsetArrays.OffsetArray{ComplexF64, 3, PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}}}}, Tuple{Float64, Float64}, true, Vector{Float64}, ODEFunction{true, typeof(f!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem}, ::Nothing; default_set::Bool, kwargs::Base.Iterators.Pairs{Symbol, Bool, Tuple{Symbol}, NamedTuple{(:second_time,), Tuple{Bool}}})
    @ DifferentialEquations ~/.julia/packages/DifferentialEquations/U5PCN/src/default_solve.jl:7
 [18] #__solve#71
    @ ~/.julia/packages/DiffEqBase/lULzQ/src/solve.jl:282 [inlined]
 [19] __solve
    @ ~/.julia/packages/DiffEqBase/lULzQ/src/solve.jl:269 [inlined]
 [20] #solve_call#56
    @ ~/.julia/packages/DiffEqBase/lULzQ/src/solve.jl:61 [inlined]
 [21] solve_call
    @ ~/.julia/packages/DiffEqBase/lULzQ/src/solve.jl:48 [inlined]
 [22] #solve_up#58
    @ ~/.julia/packages/DiffEqBase/lULzQ/src/solve.jl:85 [inlined]
 [23] solve_up
    @ ~/.julia/packages/DiffEqBase/lULzQ/src/solve.jl:75 [inlined]
 [24] #solve#57
    @ ~/.julia/packages/DiffEqBase/lULzQ/src/solve.jl:70 [inlined]
 [25] solve(::ODEProblem{OffsetArrays.OffsetArray{ComplexF64, 3, PencilArray{ComplexF64, 3, Base.ReshapedArray{ComplexF64, 3, SubArray{ComplexF64, 1, Vector{ComplexF64}, Tuple{Base.OneTo{Int64}}, true}, Tuple{}}, 3, 0, Pencil{3, 2, Permutation{(3, 2, 1), 3}}}}, Tuple{Float64, Float64}, true, Vector{Float64}, ODEFunction{true, typeof(f!), LinearAlgebra.UniformScaling{Bool}, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, Nothing, typeof(SciMLBase.DEFAULT_OBSERVED), Nothing}, Base.Iterators.Pairs{Union{}, Union{}, Tuple{}, NamedTuple{(), Tuple{}}}, SciMLBase.StandardODEProblem})
    @ DiffEqBase ~/.julia/packages/DiffEqBase/lULzQ/src/solve.jl:68
 [26] top-level scope
    @ ~/Desktop/mre.jl:122

Is mapreduce defined for this array?

There’s no specific definition of mapreduce for this kind of arrays, but mapreduce seems to work just fine for PencilArrays.

@PlasmaJoe The case of in-place FFTs is a bit tricky, because arrays in physical and spectral space in general different sizes, while needing to share the same block of memory. I would suggest starting with out-of-place FFTs, as they have a bit simpler to work with… That would also allow you store your configuration-space data in Real arrays (as opposed to Complex), assuming that your problem allows that. I will see if I can improve on your example to get a working code.

To be more precise, your first_idea! doesn’t work because DifferentialEquations creates a copy of the local_S array. That copy is obviously no longer aliased to the container A (which is a kind of obscure ManyPencilArray object), which is why when you perform in-place FFTs on A you don’t see any changes in u.

This should not be a problem with out-of-place FFTs, since in that case we don’t need such ManyPencilArray containers, and one can work directly with PencilArrays such as local_S and their copies.

Maybe we could get DifferentialEquations to work with ManyPencilArrays, but the first step would be to make sure that everything works with regular PencilArrays (that is, with out-of-place FFTs). I have personally never tried, but it would be great if that was the case.

Thanks for the advice! I went through a couple of iterations using the out-of-place transform and have a version that is giving identical results on one worker and on two workers. The transform is still complex-to-complex because I’m modeling waves and the complex notation is my preferred way to keep track of the phase. The latest implementation is below, let me know if this is something like what you had in mind.

using MPI
MPI.Init()
using DifferentialEquations
using PencilArrays
using PencilFFTs
using LinearAlgebra

comm = MPI.COMM_WORLD
mpi_size = MPI.Comm_size(comm)
mpi_rank = MPI.Comm_rank(comm)

# set the box dimensions. (Small for illustration, realistic case would be 1024, 1024, 2048)
Nx, Ny, Nz = 64, 64, 64

# divide the space up over the mpi workers
proc_dims = let pdims = zeros(Int, 2)
		MPI.Dims_create!(mpi_size, pdims)
		pdims[1], pdims[2]
	    end

# set the transform to out-of-place, complex-to-complex
transform = Transforms.FFT()
plan = PencilFFTPlan((Nx,Ny,Nz), transform, proc_dims, comm)

# allocate memory for the output arrays
S = allocate_output(plan)

# allocate memory for input arrays
R = allocate_input(plan)
dR = allocate_input(plan)

# Global view for the input arrays
local_R = global_view(R)
rngR = axes(local_R)

# set up coodinates to initialize the main pencil array in configuration space
x, y, z = range(-1,1,length = Nx), range(-1,1,length = Ny), range(-1,1,length = Nz)

# function to intialize the main pencil array
fi(x) = exp(-100*x^2)

# call out the size of the pencil array fox indexing later
local_size = size(R)

# gives the same output on 1 or 2 workers
function third_idea!(du, u, p, t)

	# go to configuration space with ifft
	println(maximum(real.(R)))
	ldiv!(R, plan, u)

	# apply a non-linearity in configuration space. Loop over the pencil array
	# size instead of using indices from global views.
	for i in 1:local_size[1], j in 1:local_size[2], k in 1:local_size[3]
		dR[i,j,k] = p[1]*R[i,j,k]
	end

	# take the main array and the derivative array to spectral space, copying 
	# into u and du in the process
	mul!(u, plan, R)
	mul!(du, plan, dR)

	return du
end

# fill A with initial data
for i in rngR[1], j in rngR[2], k in rngR[3]
	local_R[i,j,k] = fi(x[i])*fi(y[j])*fi(z[k])
end

# in the real code the arrays are transformed before the ode step
mul!(S, plan, R)

# set up an ODE problem
tspan = (0.0,1.0)
p = [0.1]

f!(du, u, p, t) = third_idea!(du, u, p, t)
prob = ODEProblem{true}(f!,S, tspan, p)
sol = solve(prob)

I’m happy to know that it seems to work! Yes, that is more or less what I had in mind.

Just a small suggestion. If you care about performance, it’s usually recommended to avoid using global variables, especially in your f! function that is repeatedly called by DifferentialEquations. In this case, you can pass extra data that you need to use within f! via its third argument (p), as you already did for the ODE coefficient. Alternatively, if you really want global variables, you should mark them as const.

Below is how I would write your example. Note that I have included extra information (the FFT plan and arrays) in the p argument to avoid using global variables. I’m also using eachindex or CartesianIndices to make sure that loops are performed in memory order.

(Note that, for arrays in spectral space, array dimensions are internally permuted as (3, 2, 1). Even in this case, CartesianIndices returns indices in memory order.)

using MPI
MPI.Init()
# using DifferentialEquations
using OrdinaryDiffEq
using PencilArrays
using PencilFFTs
using LinearAlgebra

comm = MPI.COMM_WORLD
mpi_size = MPI.Comm_size(comm)
mpi_rank = MPI.Comm_rank(comm)

# set the box dimensions. (Small for illustration, realistic case would be 1024, 1024, 2048)
Nx, Ny, Nz = 64, 64, 64

# divide the space up over the mpi workers
proc_dims = let pdims = zeros(Int, 2)
    MPI.Dims_create!(mpi_size, pdims)
    pdims[1], pdims[2]
end

# set the transform to out-of-place, complex-to-complex
transform = Transforms.FFT()
plan = PencilFFTPlan((Nx,Ny,Nz), transform, proc_dims, comm)

# allocate memory for the output arrays
S = allocate_output(plan)

# allocate memory for input arrays
R = allocate_input(plan)
dR = similar(R)

# set up coodinates to initialize the main pencil array in configuration space
x, y, z = range(-1,1,length = Nx), range(-1,1,length = Ny), range(-1,1,length = Nz)

# function to intialize the main pencil array
fi(x) = exp(-100*x^2)

# gives the same output on 1 or 2 workers
function third_idea!(du, u, p, t)
    R = p.R
    dR = p.dR
    plan = p.plan

    # go to configuration space with ifft
    println(maximum(real.(R)))
    ldiv!(R, plan, u)

    # apply a non-linearity in configuration space. Loop over the pencil array
    # size instead of using indices from global views.
    # NOTE: if you don't need the (i, j, k) indices inside the loop, then
    # you can just use `eachindex`, which should be faster as it will loop
    # using linear indices.
    # If the indices are needed, then one can also use CartesianIndices(R).
    for i in eachindex(R)
        dR[i] = p.coef[1]*R[i]
    end

    # take the main array and the derivative array to spectral space, copying 
    # into u and du in the process
    mul!(u, plan, R)
    mul!(du, plan, dR)

    return du
end

# fill A with initial data
local_R = global_view(R)
for I in CartesianIndices(local_R)
    i, j, k = Tuple(I)
    local_R[I] = fi(x[i])*fi(y[j])*fi(z[k])
end

# in the real code the arrays are transformed before the ode step
mul!(S, plan, R)

# set up an ODE problem
tspan = (0.0,1.0)
p = (;
    coef = [0.1],
    plan, R, dR,
)

f!(du, u, p, t) = third_idea!(du, u, p, t)
prob = ODEProblem{true}(f!,S, tspan, p)
sol = solve(prob, Tsit5())