Implementing parallel fluids code in Julia (parallel FFTW, MPI.jl, ...)

ChrisRackauckas · June 29, 2017, 10:01pm

TL;DR

Indirect (fixed length) ApproxFun works, but is slightly wasteful because it grows then chops. Fully adaptive ApproxFun works in theory, but to get it to work well in practice that’s a research project that’s at least a year (probably more) off (but it definitely has potential to be “the best”) because right now the spatial adaptivity is working against me and not with me. Specializing methods for spectral discretizations is easy in the DiffEq framework, and the methods that you show in that paper as counterexamples to the possibility of this are actually really straightforward application of the refined ODE interface. Some examples exist to show that it will indeed work well. But there’s still work to do.

If you want to test something right now, test the fixed-length version with I don’t know, Tsit5(), radau(), or CVODE_BDF(). Not going to guarantee it’s good, but it should work.

Setup

Well I might as well share my experimental code to watch it get stomped. I’ll probably be releasing the alpha soon:

There’s two setups.

Indirect

One indirectly uses ApproxFun by saving out the coefficients each time. This setup is nice because it can be used with any of the common interface ODE solvers, so you can just plop it from OrdinaryDiffEq.jl into Sundials.jl or whatnot. It looks like this:

using DiffEqBase, OrdinaryDiffEq, Sundials, ApproxFun, DiffEqApproxFun
using ODEInterfaceDiffEq
using Base.Test

S=Fourier()
u0=Fun(θ->cos(cos(θ-0.1))-cos(cos(0-0.1)),S)
c=Fun(cos,S)
ode_prob = ODEProblem((t,u)->u''+(c+1)*u',u0,(0.,1.))
prob = ApproxFunProblem(ode_prob)
@time sol=solve(prob,Tsit5())
sol(0.5) # solution at t=5
sol(0.5,0.2) # solution at t=5, θ=0.2
@time sol=solve(prob,CVODE_BDF())
@time sol=solve(prob,radau())

What that’s doing is using the number of coefficients from the initial condition to set the length of the vector, and then it solves the ODE defined by the vector of that length by converting it into the appropriate ApproxFun each time f is called. That part is cheap, but it’s slightly wasteful because after the f step, it may have grown the ApproxFun (for accuracy) but then we just chop those right off. But still, this seems pretty fast for what it is, and it’s robust+useful… so its okay for the time being.

It’s also able to handle boundary conditions which are not just set by the space by a projection callback. This does pretty well.

It does have a bug though that, while you can in theory choose the length of the coefficient vector via ApproxFunProblem(ode_prob,N), in practice that makes it instantly blow due to some bug I haven’t worked out… so this mode is unusable right now.

Direct

Now this is the mode that sounds interesting. This is directly using an ApproxFun Fun type inside of OrdinaryDiffEq.jl. Does it work? Yes! Does it work well? No. There seems to be some fundamental issues with the approach right now. Fun types are adaptively sized, which works great in many cases, but when left unchecked it just grows and grows here. It’s like interval arithmetic where after time it blows up. So even on relatively simple problems:

using DiffEqBase, OrdinaryDiffEq, Sundials, ApproxFun, DiffEqApproxFun
using ODE, ODEInterfaceDiffEq
using Base.Test

S=Fourier()
u0=Fun(θ->cos(cos(θ-0.1))-cos(cos(0-0.1)),S)
c=Fun(cos,S)

prob = ODEProblem((t,u)->u''+(c+1)*u',u0,(0.,1.))
@time sol=solve(prob,Euler(),dt=1/1000)

you’ll end up with about 2000 coefficients for Euler by t=0.4 and get CFL instability (for comparison: the indirect ApproxFun version uses a fixed vector of 36 coefficients and solves this just fine). Then this has some issues plugging into system tooling: autodifferentiation to get the Jacobian needs a separate path. With that issue worked out by handle, technically implicit methods work, but the coefficient blow up still makes it difficult, and the implicit solve can make coefficients blow up faster. Maybe this will do better when the linear solve can be swapped out in NLsolve, because then it would be easier to chop right after the linear solve.

Other “Research” In This

There’s also SpectralTimeStepping.jl:

github.com

JuliaApproximation/SpectralTimeStepping.jl/blob/master/src/timeevolution.jl

## This file is outdated

export timeevolution, ETDRK4, BDF2, BDF4, BDF22


function RK(L,y,h)
    k1=L(y)
    k2=L(y+.5h*k1)
    k3=L(y+.5h*k2)
    k4=L(y+h*k3)

    y+h*(k1+2k2+2k3+k4)/6.
end

function BDF2(B,A::Operator,g::Function,bcs,u0,h,m,glp,tol=1000eps())
    SBDF2 = [B;I-2.0/3.0*h*A]

    u1=u0
    u2=chop(RK(g,u1,h),tol)
    u2,u1  = chop(SBDF2\[bcs;1/3.0*(4u2-u1)],tol),u2

This file has been truncated. show original

it directly defines some common methods for ApproxFun Fun types. To keep the coefficients from growing too fast, it aggressively chops and uses multistep methods to reduce the number of function calls. Indirect seems to do better though.

Intermediate Conclusion

Naive usage of adaptive space with and without adaptive time doesn’t work well. I am sure we can refine this approach much more, in which case it could be really really good. But that’s likely a full research project. The linear operator approaches from SpectralTimestepping.jl can also be taken in OrdinaryDiffEq.jl, and I’ll explain that in a sec, and that might do much better.

You can just directly write the timestepping method to match the spatial discretization. Here’s a few examples that already have things working towards them:

http://docs.juliadiffeq.org/latest/types/refined_ode_types.html

Actually, the method in the paper that you linked is of the Linear-Nonlinear Form:

http://docs.juliadiffeq.org/latest/solvers/refined_ode_solve.html#Linear-Nonlinear-(LNL)-ODE-1

It’s a Crank-Nicholson Linear + Adams-Bashforth nonlinear (with a constant term added, but that’s simple to add) and can be directly integrated using that method when it’s complete. Then the second method they list is a Linear Crank + nonlinear RK. Why not? Same form of equation, it can get an implementation.

If it’s put into DiffEq then it takes about 2-3 lines to get event handling and adaptivity going. I don’t see a paper which does this kind of method with adaptive timestepping, so that sounds like an interesting things to do since it can lead to a quick publication.

Whether it’s good for a specific problem or not depends on the application further analysis, but just because it hasn’t been abstracted like that before doesn’t mean it can’t be: these “special PDE methods” are written for the problem but apply in a much greater domain if one instead thinks about them as a method on split ODEs with some linear and some nonlinear terms. So thanks for giving me some examples to show exactly how it does fit into DiffEq.

As for how far along that stuff is, it’s already seeing results. The docs say that none of the exponential integrators exist yet, but they do:

https://github.com/JuliaDiffEq/OrdinaryDiffEq.jl/blob/master/test/linear_nonlinear_convergence_tests.jl

This stuff just isn’t released because the ecosystem needs an expmv for this to be efficient on large PDEs (for small PDEs, this works great already. It has to do with the fact that expm is dense even if A is sparse, so you need expmv to compute it, or a very good lazy exponentiation for specific discretizations).

Conclusion

See the TL;DR. Work with ApproxFun to get the full adaptivity working for us instead of against us is more difficult and will take time, but I have high hopes for the future. However,

that statement is false: DiffEq has the language to implement those specialized PDE algorithms as “refined ODE problems” as shown above. I just need to clone myself, or find a contributor who’s interested in helping with this part. Or just someone to help implement algorithms and finish up expmv other lower-level ecosystem type things. Whichever is easier.

That’s a pretty thorough overview of the current state.

Edit

Pinging @dlfivefifty on this update. Maybe when we meet in person we can brainstorm a plan for how to whip this into shape.

dlfivefifty · June 29, 2017, 10:49pm

I agree. Using ApproxFun to construct fixed-sized discretizations should work fine. Using spatial adaptivity opens more questions than answers, that will be at least a year away before it’s understood and implemented I think.

One note: I had an honours student Matt Cassell at USyd whose project found that L-stable schemes do reliably support adaptivity. Though whether the artificial numerical diffusion that L-stability induces is desirable or not depends on the problem.

Parallelisation is a whole other question, which I havent thought about seriously. This may be better as a layer on top of ApproxFun than something built in (that is, each Fourier mode is a Chebyshev Fun that lives in a different process). Or it may be better done at the linear algebra level (that is, the discretisation returns a DistributedBlockDiagonalMatrix that then has \ overloaded to do a multi-process solve).

ChrisRackauckas · June 29, 2017, 11:11pm

L-stable allowing full spatial adaptivity makes sense, though if capped even explicit should be fine. And even BDF seems like it should be fine to the extent that complex eigenvalues are capped. But even then, Rosenbrock23() is L-stable and doesn’t do well right now because of coefficient blow up. The issue is that there is two forms of error. The Fun is growing coefficients to make the error of each operation go to its tolerance (eps, right?), while the DiffEq is trying to only solve with like reltol=1e-3. To make this better, I think the Fun needs to implicitly know to chop at a much higher tolerance, and then it needs to adjust its behavior each time the diffeq gets an error estimate (one is free every step from timestepping adaptivity). I’m sure that with some fine-tuning controlling error both operation-wise and then stepwise we’ll have something much more controlled.

Also, I’m not entirely sure that broadcasting works the way I’d expect it to which may be introducing some problems in the current error estimator.

ChrisRackauckas · June 29, 2017, 11:14pm

Sidenote: if you could add a Hermite expansion, then this would be interesting to use for SPDEs.

John_Gibson · June 30, 2017, 3:18am

@ChrisRackauckas: DiffEqApproxFun.jl looks pretty intriguing. I would really like to see how it could be used to solve Kuramoto-Sivashinsky (u_t + u_xx + u_xxxx + u u_x = 0) on a periodic or bounded domain, with fixed spatial discretization, operator splitting, and collocation calculation of the nonlinear term, and how efficient that code is compared to a handwritten code (which takes about 25 lines of Julia).

Navier-Stokes is another story. In the algorithm I linked ot above, the linear/nonlinear operator splitting and choice of time-stepping algorithm barely scratches the surface of the numerics. The real issues arrive when you solve the time-stepping on the momentum equation while satisfying the boundary conditions and the divergence-free constraint using truncated Chebyshev expansions. How to break the coupled four-component momentum+pressure equations into efficiently-solvable independent 1d boundary value problems is fairly intricate. Shifting Chebyshev trnucation errors out of the velocity divergence and into the pressure field and momentum balance (crucial for numerical stability) is a very subtle problem with an equally complex solution method. Look up “influence-matrix method” and “tau correction method” in Canuto, Hussaini, Quateroni, and Zhang’s "Spectral Methods for Fluid Dynamics, Springer-Verlag 2007. This is what I meant by “too tightly interwined” above, not operator splitting.

I can imagine that ApproxFun + DiffEq is or could soon be capable of transforming Navier-Stokes + discretization method into a working numerical algorithm, but I can’t imagine that it would think through all the subtleties, find appropriate solution methods, and produce an algorithm as efficient and as stable as a hand-written spectral fluids code designed with these things in mind.

Of course that’s probably what assembly-language people said to the writers of the first Fortran compiler.

ChrisRackauckas · June 30, 2017, 5:29am

I already agreed above that making it work with ApproxFun and thus adaptive space is much more difficult if not a research project itself, so I’m not sure why that was brought up. You basically said that adaptive space methods will have difficulty implementing corrections designed for methods with a set number of Fourier modes. I don’t think anyone’s going to disagree with that. That’s pretty much true by definition. It probably needs different methods and corrections, and I don’t know if they already exist.

But a constant number of Fourier modes m? The indirect ApproxFun already does this. As I said, it’s probably a little bit wasteful because of the way that it does the chopping, so yes I agree that if you pre-expanded the equation and then wrote it for the Fourier modes yourself it could be faster (at least until ApproxFun introduces a constant-sized Fun, in which case I’m not sure there would be much of a difference between a by-hand implementation and this).

But that has no bearing on the implementation of the timestepping algorithm.

John_Gibson:

Navier-Stokes is another story. In the algorithm I linked ot above, the linear/nonlinear operator splitting and choice of time-stepping algorithm barely scratches the surface of the numerics. The real issues arrive when you solve the time-stepping on the momentum equation while satisfying the boundary conditions and the divergence-free constraint using truncated Chebyshev expansions. How to break the coupled four-component momentum+pressure equations into efficiently-solvable independent 1d boundary value problems is fairly intricate. Shifting Chebyshev trnucation errors out of the velocity divergence and into the pressure field and momentum balance (crucial for numerical stability) is a very subtle problem with an equally complex solution method. Look up “influence-matrix method” and “tau correction method” in Canuto, Hussaini, Quateroni, and Zhang’s "Spectral Methods for Fluid Dynamics, Springer-Verlag 2007. This is what I meant by “too tightly interwined” above, not operator splitting.

This influence matrix method?

http://www.maths.usyd.edu.au/u/duncans/anziam2011.pdf

The idea is that ω0 carries the time dependence, and ω1,2 are found once and for all, and are used for enforcing boundary values.

That just needs alpha 1 and alpha 2 given to the method. If one needs to, the user could just adapt alpha 1 and alpha 2 in a DiscreteCallback. This just means that we need CrankAB2(alpha1=...,alpha2=...) to handle it. We’re already putting tie-ins to common PDE applications for things like the SSPRK methods (limiters used for hyperbolic PDEs).

The tau-correction is on the user side: the user needs to change the numerical equation being solved to account for the approximation-induced divergence. This is easy to do if the user writes a code and kicks it to some CrankAB2() algorithm.

So…

Yes, I just said that. Getting the timestepping methods for the algorithms you’ve mentioned, along with the corrections as part of the routine, is straightforward “look at paper, write method into DiffEq”. Smashing it into an adaptive space method (which these are currently not) is not straightforward, so as above I also don’t think ApproxFun + DiffEq will magically do it anytime soon. The two alone work really well, but putting both forms of adaptivity together needs specialized methods.

ChrisRackauckas · June 30, 2017, 5:35am

Which operator splitting method? There’s many kinds of operator splitting methods. I agree it doesn’t take much time to write the perform_step method to add it, though I am not sure I have the one you need yet. This was mentioned in the original long reply, sorry if it wasn’t clear.

dlfivefifty · June 30, 2017, 7:03am

Tau correction (and even eigenvalues of the discretisation) are only relevant to non-adaptive algorithms.

Also, it’s worth mentioning that that book predates the coefficient-based method and adaptive QR algorithm described in Olver & Townsend 2012 (on which ApproxFun is based), for which many of the traditionally viewed problems of spectral methods do not actually apply. For example, we easily solve equations requiring millions unknowns, with no stability issues.

Until fluids applications are properly investigated, it’s more accurate to say “we don’t know if it will work well until implemented” than “it won’t work due to the arguments in a book from 2007” since those arguments do not necessarily apply.

John_Gibson · July 5, 2017, 11:14pm

@ChrisRackauckas & @dlfivefifty

I should clarify that my goal is to develop a parallel, distributed-memory spectral Navier-Stokes code in Julia that is competitive in efficiency with existing C/C++/Fortran codes. I want to do this in a relatively short and predictable time frame, say in time for JuliaCon 2018.

I am definitely listening to you guys (and reading your papers --thanks for the refs-- and the documentation for your packages), but it does still seem to me that fitting my problem into the framework of DiffEq or ApproxFunDIffEq is too much of a research project for me, and the benefits are not clear. Your motivations are probably different from mine, maybe to demonstrate the generality of your frameworks, or maybe the large-N scalability of the new Chebyshev methods. That’s cool; if if you want to pursue it yourself I’ll follow your work with great interest. But parallelism, efficiency relative to existing codes, and predictability of implementation are paramount for me.

@dlfivefifty, you say “we won’t know until we try.” I suppose that’s true, but given that distributed memory parallelsim and computational efficiency are top priorities for me, it’s more of a gamble than I am comfortable with right now --though I remain open to persuasion. Regarding “a book from 2007,” you must know the book, as it’s referenced in your 2014 JCP paper. And I think it’s fair to say it’s the canonical reference for a spectral methods for Navier-Stokes. It would be a huge mistake to launch into writing a spectral Navier-Stokes algorithm without reviewing the prior research described there.

I’ve tried to convey above some of the problem-specific mathematical refactoring tricks these algorithms do to wring out maximum computational efficiency while controlling the distribution of discretization errors. In the algorithm I’ve described above, it’s all about refactoring 4 coupled 1d Helmholtz boundary-value problems into 4 independent Helmholtz problems, solving them with specially-tuned banded tridiag solvers, and then modifying the solutions of 3 velocity components to correct divergence errors resulting from the fact that the pressure equation is solved at second order (p’’ - lambda p = f) but then substituted back on the right-hand-side of the velocity equations at first order (p’). You could ignore prior research and solve the straightforward/naive Chebyshev discretization of the coupled system of momentum + divergence equations, but it’d cost you big time in execution speed, a factor of 16 at least, I think.

I do hope that ApproxFun will save me the trouble of implementing Chebyshev expansion methods from the ground up, like I did for my C++ code.

Lastly, I think we’re all in agreement that adaptive Chebyshev methods are inappropriate for this Fourier x Fourier x Chebyshev algorithm and its 3d Cartesian grid. So let’s just drop adaptive methods from the discussion.

dlfivefifty · July 6, 2017, 12:51am

Have a look at Dedalus which does exactly what you want to do using Python:

http://dedalus-project.org

Note it uses banded spectral methods very similar to ApproxFun, with tau corrrections. But just like ApproxFunDiffEq, it makes the choice of time stepping modular.

So what I was referring to as a research project was only the spatial adaptivity. Everything else you want to do is doable, in so much as it’s already been done in Python.

John_Gibson · July 6, 2017, 2:41am

I know Dedalus. It’s an impressive project, but in service of generality it does precisely what I’m saying you should not do if you want to compete with top-performing spectral Navier-Stokes codes. Namely, a Dedalus implementation of Navier-Stokes would do a straightforward/naive Chebyshev discretization of the coupled 3-component momentum plus divergence equations. I don’t believe Dedalus’s rudimentary symbolic language for PDEs is expressive enough to represent the kinds of tricks I’m talking about, or its transformation engine strong enough to implement them.

John_Gibson · July 6, 2017, 3:01am

Dedalus implements the Chebyshev tau method for enforcing boundary conditions, but tau correction is different and substantially more complicated: the correction of the velocity solution due to the fact that pressure is solved for in a 2nd-order equation but used on the RHS of the velocity eqns in 1st order, as I described above. Dedalus doesn’t do tau correction.

In fairness Dedalus might be able to avoid the need for tau correciton and get within a factor of four of a top-performing code (barring other inefficiencies) if one gave it Navier-Stokes in velocity-vorticity formulation rather than velocity-pressure. Its symbolic language might be epxressive enough for that.

ChrisRackauckas · July 6, 2017, 3:58am

Go make the fully by hand version. It’ll serve as a good reference code for when we eventually want to benchmark against anyways. When it’s written it’ll be trivial to show how to pull out the timestepping part. Then obviously it’ll also be easy to show the disadvantage to defaulting to reimplementations low time order methods as well. That would probably be a pretty useful step to a paper anyways, so I’ll be happy it exists.

But in reality where the real work and arguing should be is in the tooling. We need the parallel FFTW. That’s pretty much agreed on, that’s something that needs to be done. I’m working on a bunch of stiff solvers including the splitting methods mentioned in the linked paper and in Dedalus. And @dlfivefifty keeps making substantial algorithmic improvements to how adaptive spectral methods work. I’m sure we can reconvene sometime later and talk with code.

favba · September 21, 2017, 3:59am

I have a seems-to-be-working(didn’t perform any real test, just looked if results seems reasonable) implementation of the serial (or multi-threaded) isotropic case here: FluidFlowSimulation.jl.

It might be a little messy for use as a simple reference code. I’m coding this module while exploring julia language, so it might not have the best design. My end goal for this module is to explore LES models for stratified turbulence, so this code is becoming much more complex…

Any feedback would be much appreciated!

juliohm · September 21, 2017, 6:46am

@ChrisRackauckas ClusterManagers.jl is only a helper package for submitting jobs, it doesn’t provide custom managers for Infiniband or anything like that. I am trying to find the time to get back to that issue that I was having in another thread where I mentioned that I don’t have the option of using the built-in SSHManager at my university’s cluster:

I talked to the sysadmin and he pointed out that one way out is to use RSH instead, which is new to me, but I will see what I can do.

Topic		Replies	Views
Benchmarking a simple PDE algorithm in Julia, Python, Matlab, C++, and Fortran Numerics diffeq	41	8968	November 19, 2019
Shallow water equations with DifferentialEquations.jl Numerics question , diffeq	22	4847	May 24, 2019
Tutorials or examples on multithreading and distributed processing for numerical PDEs New to Julia	8	1266	June 2, 2021
Implementing parallel computing in finite element problems Numerics	6	2057	August 15, 2018
Questions on a number of code acceleration techniques General Usage performance , hpc , parallel	11	1796	July 8, 2017