Multigrid solver prototype (GMG) and simple Lid Cavity solver

The following GitHub - triscale-innov/LidJul.jl: GMG, Poisson solver and Lid cavity repo contains a simple Julia implementation of a Geometric Multi-Grid (GMG) solver. This solver is compared with available Julia linear solvers (AMG, iterative, direct,…) and used in a toy 2D CFD simulation.

Note that it is a WIP developed for training/educational purpose and is not a tool ready for serious computing.


The solver is adapted from the Harald Köstler’s implementation:

“Multigrid HowTo: A simple Multigrid solver in C++ in less than 200 lines of code”
https://www10.cs.fau.de/publications/reports/TechRep_2008-03.pdf

The solution for a 2D Poisson’s equation with a 128x128 mesh :

And the computing times for various solvers (GMG wins :wink: )

┌────────────────────┬───────────┬─────────────┬────────────┬───────────┬─────────┐
│ solver             │ init_time │ solver_time │ Total Time │ residual  │ niters  │
├────────────────────┼───────────┼─────────────┼────────────┼───────────┼─────────┤
│ gauss_seidel       │ 8.900E-08 │ 4.492E-03   │ 4.492E-03  │   NaN     │ 10      │
│ GMG_GSSmoother     │ 7.034E-05 │ 5.264E-03   │ 5.335E-03  │ 1.148E-10 │ 17      │
│ TTSolver           │ 6.101E-03 │ 3.554E-04   │ 6.457E-03  │ 4.886E-11 │ nothing │
│ PCGAMG{SmoothA}    │ 1.383E-02 │ 4.145E-02   │ 5.529E-02  │ 1.916E-09 │ 16      │
│ PCGAMG{RugeStuben} │ 2.825E-02 │ 4.817E-02   │ 7.643E-02  │ 6.521E-10 │ 12      │
│ SparseLU           │ 9.333E-02 │ 6.262E-03   │ 9.959E-02  │ 1.748E-12 │ nothing │
│ PCGnothing         │ 5.100E-08 │ 1.431E-01   │ 1.431E-01  │ 5.959E-09 │ 605     │
│ SparseAMG          │ 8.246E-02 │ 6.945E-02   │ 1.519E-01  │ 6.521E-10 │ 12      │
│ jacobi             │ 5.700E-08 │ 3.014E-01   │ 3.014E-01  │ 9.707E+01 │ 2000    │
│ PCGILU             │ 2.629E-01 │ 7.272E-02   │ 3.356E-01  │ 4.687E-10 │ 8       │
│ sor                │ 3.000E-08 │ 4.922E-01   │ 4.922E-01  │ 6.332E+01 │ 2000    │
│ ssor               │ 3.100E-08 │ 9.461E-01   │ 9.461E-01  │ 6.332E+01 │ 2000    │
│ PCGDiagonalPrecond │ 2.945E-04 │ 5.147E+00   │ 5.148E+00  │ 6.132E-09 │ 602     │
└────────────────────┴───────────┴─────────────┴────────────┴───────────┴─────────┘

A simple CFD application of the solver (Lid Cavity).

Finally the GMG solver is used in the Julia translation fo the classical Benjamin Seibold’s MIT18086_NAVIERSTOKES matlab implementation that simulates a square lid cavity. See the details here:

test

All your comments are more than welcomed !

17 Likes

This looks great. (And I think I’ll learn a lot of Julia by looking through this!)

Thanx ! But do not expect too much… I hope that I will also learn how to improve this implementation. In particular, when I was working on the solver (one year ago) I made a few attempts with //ism (MT) that were not very successful…
I also had some difficulties with the performance of the derivative computations for the CFD matlab translation. I think that some of my problem (allocating views) are about to be solved with Julia 1.5.

Anyway, a good GMG Julia package would be very useful and I hope that you find time to work on this !

Thanks! I will add you to my github repo where I’m working on the flow solver if you would like. I think I’ve got the derivatives running pretty fast now, and it’s generalized for 2 or 3D flows. I haven’t ever done the Julia package thing, so it’s only a private repo at the moment.

Thanks ! I am sure that my derivative implementation can be improved a lot. I think that the performance is not too bad but the implementation is ugly.

I also think that the StaticKernels.jl package may bring the optimal solution.

That looks neat. Right now, the only performance hog is AMG. So GMG gets my attention first.

I have just came across this very nice CFD tutorial with Julia:

with the corresponding repo which contains a GMG implemention.
https://github.com/omersan/CFD_Julia_MAE5093

2 Likes

Ah! I’ve met Omer! Some of his research is pretty aligned with mine.

For better or worse, I’ve already finished a first draft implementation of GMG. I decided to organize the code a bit differently since

  • I need to solve a variable coefficient Poisson equation
  • I want it to be able to run 3D as well as 2D.

I managed what I think is a pretty fast and general implementation using CartesianIndex.jl. It’s about 9 times faster than AMG.jl for a general Immersed Boundary simulation.

However, I haven’t opened up the repo yet since yours is documented and visualized so much nicer! I need to figure out how to do that. :grinning:

1 Like

Great ! Those two features (variable coeff and 3D) correspond to what I was missing for being able to test GMG for simplified 3D transport and thermo hydro simulations.
At this point the big challenge that will remain is the top performing parallel implementation.

I think that it would be useful to use pkg skeleton in order to prepare your package.

Thanks for the tip. I’ll check it out.

Once it is up, there will be a laundry list of things to improve. High on that list will be parallel computing, maybe using GPUs.

Interesting, I’ve also been working on a variable coefficient Poisson equation implementation using a variety of different methods. Will also look into the GMG stuff now as well. My implementation will work in any number of dimensions, since I am using Grassmann.jl my geometric foundation allows me to write fast geometric algorithms which automatically generalize to any dimension.

Neat.

The Poisson Eq is trivial in any dimension on a Cartesian grid (which I’m using). It’s just a matter of looping through the faces of each element. I haven’t found an application in my work for N>3 yet, but I’ve got the GMG solver ready. :wink:

Hi,

I read again your code and I found the coding style really inspiring:

for I ∈ inside(a)
    a[I] = sum(@inbounds(b[J]) for J ∈ near(I))
end

I try to mimic it in my code but I need more practice…

One thing that you may include in the future is a red-black version of the GS! (allowing efficient vectorization and //ization by breaking dependencies with very little (or no) impact on GMG convergence). It may reduce the GS! time by a large (>4 factor). In a second step, @elrod suggested to adapt the underlying data layout (to have red and black cells contiguous which also allow for a substantial acceleration x3).

Can’t wait to see your package published :wink:

2 Likes

I’ll try to publish it this week. I’ve looked at PkgSkeleton by @Tamas_Papp and it generates a folder structure and copies through files. How do I keep my current repository https://github.com/weymouth/WaterLily (I’ve made it public)?

2 Likes

For existing packages, I would recommend

  1. starting with a clean git status (your own code committed),
  2. generating the package structure in another directory,
  3. copying everything over, and manually select what to keep (using git).

Please make sure you back up your work before doing anything — PkgSkeleton has some protection against data loss/overwriting, but better safe than sorry.

2 Likes

Ok, done. I don’t think it got everything right, but it’s a start.

That is great !
Though, I may have missed something but I can’t obtain good performances using CartesianIndexes and your compact notations ( near, inside…).
The following loop based function:

function res!(rlc,rlf)
    (nrows,ncols) = size(rlc)
    @inbounds @simd for j=2:ncols-1
        fj=2j-2
        for i=2:nrows-1
            fi=2i-2
             rlc[i,j]=(rlf[fi,fj]+rlf[fi+1,fj]+rlf[fi,fj+1]+rlf[fi+1,fj+1])
        end
    end
end

is 6-8 time faster on my laptop than:

function res_ci!(rlc,rlf)
    @fastmath @inbounds @simd for I ∈ inside(rlc)
            rlc[I]=sum(@inbounds(rlf[J]) for J ∈ near(I))
    end
end

Do you have the same ?
The following test MWE:

MWE
using BenchmarkTools
using LinearAlgebra
using Random

@inline CI(a...) = CartesianIndex(a...)
@inline δ(a,d::Int) = CI(ntuple(i -> i==a ? 1 : 0, d))
@inline δ(a,I::CartesianIndex{N}) where {N} = δ(a,N)
@inline CR(a...) = CartesianIndices(a...)
@inline inside(M::NTuple{N,Int}) where {N} = CR(ntuple(i-> 2:M[i]-1,N))
@inline inside(a::Array; reverse::Bool=false) =
        reverse ? Iterators.reverse(inside(size(a))) : inside(size(a))
@inline inside_u(N::NTuple{n,T}) where {n,T} = CR(ntuple(i->2:N[i],n-1))
@inline near(I::CartesianIndex,a=0) = (2I-2oneunit(I)):(2I-oneunit(I)-δ(a,I))

function res!(rlc,rlf)
    (nrows,ncols) = size(rlc)
    @inbounds @simd for j=2:ncols-1
        fj=2j-2
        for i=2:nrows-1
            fi=2i-2
             rlc[i,j]=
(rlf[fi,fj]+rlf[fi+1,fj]+rlf[fi,fj+1]+rlf[fi+1,fj+1])
        end
    end
end

function res_ci!(rlc,rlf)
    # @inside rlc[I] = sum(@inbounds(rlf[J]) for J ∈ near(I))
    @fastmath @inbounds @simd for I ∈ inside(rlc)
            rlc[I]=sum(@inbounds(rlf[J]) for J ∈ near(I))
    end
end

#function that call the benchmarks
function benchmark_res(nf)
    @assert iseven(nf)
    # initialize two 2D arrays (fine and coarse)
    nc=nf>>>1
    Random.seed!(1234);

    rlf=rand(nf+2,nf+2)
    rlc=rand(nc+2,nc+2)
    ih2=rand()
    # the following blocks checks that res and res_ci produce same results
    rlc_ci=deepcopy(rlc)
    res!(rlc,rlf)
    res_ci!(rlc_ci,rlf)
    # @show @code_native debuginfo=:none res!(rlc,rlf)
    # @show @code_native debuginfo=:none res_ci!(rlc,rlf)
    residual=norm(rlc-rlc_ci)
    # #The actual timings
    tres=@belapsed res!($rlc,$rlf)
    tres_ci=@belapsed res_ci!($rlc,$rlf)
    @show "restrict",nf,residual,tres,tres_ci,tres_ci/tres
    nothing
end

for n in [32,256]
        benchmark_res(n)
end

returns:

(nf, tres, tres_ci, tres_ci / tres) = (32, 1.6225823451910409e-7, 1.1553e-6, 7.120131705019731)
(nf, tres, tres_ci, tres_ci / tres) = (256, 1.1933e-5, 7.3877e-5, 6.190982988351629)

There is certainly a performance hit using CartesianIndex in 2D. In 3D, I don’t see the same slowdown. Maybe this is because looping through high dimensional arrays has a penalty similar to using CartesianIndex… Maybe because @simd can be used for the single CartesianIndex loop, but only one of the 3D loops?

julia> @btime WaterLily.restrict!(a,b) setup=((a,b)=(zeros(34,34),rand(66,66)))
  5.150 μs (0 allocations: 0 bytes)
julia> @btime res2D!(a,b) setup=((a,b)=(zeros(34,34),rand(66,66)))
  698.604 ns (0 allocations: 0 bytes)
julia> @btime res3D!(a,b) setup=((a,b)=(zeros(34,34,1),rand(66,66,1)))
  8.567 μs (0 allocations: 0 bytes)
julia> @btime WaterLily.restrict!(a,b) setup=((a,b)=(zeros(34,34,34),rand(66,66,66)))
  290.300 μs (0 allocations: 0 bytes)
julia> @btime res3D!(a,b) setup=((a,b)=(zeros(34,34,34),rand(66,66,66)))
  339.300 μs (0 allocations: 0 bytes)

res2D!() is your code, and res3D!() is below. It’s requires the arrays be 3D, but if the size of the last dimension is 1, it does the 2D restriction. However, it’s slow in both 2D and 3D!

Summary
inside2(n) = min(2,n):max(1,n-1)
near2(i) = 2i-2:2i-1
near2(i,n) = ifelse(n==1,1:1,near2(i))
@fastmath function res3D!(a,b)
    (ni,nj,nk) = size(a)
    @inbounds for i ∈ inside2(ni), j ∈ inside2(nj), k ∈ inside2(nk)
        σ = 0.
        @inbounds for I ∈ near2(i), J ∈ near2(j), K ∈ near2(k,nk)
            σ+= b[I,J,K]
        end
        a[i,j,k] = 0.
    end
end

I guess the options are:

  • Somehow improve CartesianIndex for 2D arrays
  • Profile the code and judiciously write special 2D only versions of key functions which get used based on multiple dispatch. Note that restrict!() is not one of them, but mult() is likely to be.

In my implementation, I don’t use cartesian indices, instead I represent the cartesian mesh using a finite element mesh (triangles, tetrahedra, cell complex). They are in a particular structure, which allows me to extract the cartesian structure of the mesh without resorting to higher dimensional arrays. Thus, I don’t need to iterate over higher dimensional arrays and only work with a single index for the mesh. My package FlowGeometry.jl is my prototype mesh generation for this, but doesn’t include solution methods there. Using this representation, might seem strange but works better for my purposes.

I studied a lot of number theory before, so instead of using cartesian indices I use an index notation based on modular arithmetic and equivalence classes.

In the future FlowGeometry package will provide a comprehensive (structured) mesh interface for this. Right now, it’s prototype.