Multigrid solver prototype (GMG) and simple Lid Cavity solver

LaurentPlagne · June 24, 2020, 8:25am

The following GitHub - triscale-innov/LidJul.jl: GMG, Poisson solver and Lid cavity repo contains a simple Julia implementation of a Geometric Multi-Grid (GMG) solver. This solver is compared with available Julia linear solvers (AMG, iterative, direct,…) and used in a toy 2D CFD simulation.

Note that it is a WIP developed for training/educational purpose and is not a tool ready for serious computing.

The solver is adapted from the Harald Köstler’s implementation:

“Multigrid HowTo: A simple Multigrid solver in C++ in less than 200 lines of code”
https://www10.cs.fau.de/publications/reports/TechRep_2008-03.pdf

The solution for a 2D Poisson’s equation with a 128x128 mesh :

And the computing times for various solvers (GMG wins )

┌────────────────────┬───────────┬─────────────┬────────────┬───────────┬─────────┐
│ solver             │ init_time │ solver_time │ Total Time │ residual  │ niters  │
├────────────────────┼───────────┼─────────────┼────────────┼───────────┼─────────┤
│ gauss_seidel       │ 8.900E-08 │ 4.492E-03   │ 4.492E-03  │   NaN     │ 10      │
│ GMG_GSSmoother     │ 7.034E-05 │ 5.264E-03   │ 5.335E-03  │ 1.148E-10 │ 17      │
│ TTSolver           │ 6.101E-03 │ 3.554E-04   │ 6.457E-03  │ 4.886E-11 │ nothing │
│ PCGAMG{SmoothA}    │ 1.383E-02 │ 4.145E-02   │ 5.529E-02  │ 1.916E-09 │ 16      │
│ PCGAMG{RugeStuben} │ 2.825E-02 │ 4.817E-02   │ 7.643E-02  │ 6.521E-10 │ 12      │
│ SparseLU           │ 9.333E-02 │ 6.262E-03   │ 9.959E-02  │ 1.748E-12 │ nothing │
│ PCGnothing         │ 5.100E-08 │ 1.431E-01   │ 1.431E-01  │ 5.959E-09 │ 605     │
│ SparseAMG          │ 8.246E-02 │ 6.945E-02   │ 1.519E-01  │ 6.521E-10 │ 12      │
│ jacobi             │ 5.700E-08 │ 3.014E-01   │ 3.014E-01  │ 9.707E+01 │ 2000    │
│ PCGILU             │ 2.629E-01 │ 7.272E-02   │ 3.356E-01  │ 4.687E-10 │ 8       │
│ sor                │ 3.000E-08 │ 4.922E-01   │ 4.922E-01  │ 6.332E+01 │ 2000    │
│ ssor               │ 3.100E-08 │ 9.461E-01   │ 9.461E-01  │ 6.332E+01 │ 2000    │
│ PCGDiagonalPrecond │ 2.945E-04 │ 5.147E+00   │ 5.148E+00  │ 6.132E-09 │ 602     │
└────────────────────┴───────────┴─────────────┴────────────┴───────────┴─────────┘

A simple CFD application of the solver (Lid Cavity).

Finally the GMG solver is used in the Julia translation fo the classical Benjamin Seibold’s MIT18086_NAVIERSTOKES matlab implementation that simulates a square lid cavity. See the details here:

test

All your comments are more than welcomed !

weymouth · June 24, 2020, 8:41am

This looks great. (And I think I’ll learn a lot of Julia by looking through this!)

LaurentPlagne · June 24, 2020, 9:11am

Thanx ! But do not expect too much… I hope that I will also learn how to improve this implementation. In particular, when I was working on the solver (one year ago) I made a few attempts with //ism (MT) that were not very successful…
I also had some difficulties with the performance of the derivative computations for the CFD matlab translation. I think that some of my problem (allocating views) are about to be solved with Julia 1.5.

Anyway, a good GMG Julia package would be very useful and I hope that you find time to work on this !

weymouth · June 24, 2020, 9:25am

Thanks! I will add you to my github repo where I’m working on the flow solver if you would like. I think I’ve got the derivatives running pretty fast now, and it’s generalized for 2 or 3D flows. I haven’t ever done the Julia package thing, so it’s only a private repo at the moment.

LaurentPlagne · June 24, 2020, 10:07am

Thanks ! I am sure that my derivative implementation can be improved a lot. I think that the performance is not too bad but the implementation is ugly.

I also think that the StaticKernels.jl package may bring the optimal solution.

weymouth · June 24, 2020, 10:23am

That looks neat. Right now, the only performance hog is AMG. So GMG gets my attention first.

LaurentPlagne · June 29, 2020, 7:18am

I have just came across this very nice CFD tutorial with Julia:

with the corresponding repo which contains a GMG implemention.
https://github.com/omersan/CFD_Julia_MAE5093

weymouth · June 29, 2020, 8:29am

Ah! I’ve met Omer! Some of his research is pretty aligned with mine.

For better or worse, I’ve already finished a first draft implementation of GMG. I decided to organize the code a bit differently since

I need to solve a variable coefficient Poisson equation
I want it to be able to run 3D as well as 2D.

I managed what I think is a pretty fast and general implementation using CartesianIndex.jl. It’s about 9 times faster than AMG.jl for a general Immersed Boundary simulation.

However, I haven’t opened up the repo yet since yours is documented and visualized so much nicer! I need to figure out how to do that.

LaurentPlagne · June 29, 2020, 9:11am

Great ! Those two features (variable coeff and 3D) correspond to what I was missing for being able to test GMG for simplified 3D transport and thermo hydro simulations.
At this point the big challenge that will remain is the top performing parallel implementation.

I think that it would be useful to use pkg skeleton in order to prepare your package.

weymouth · June 29, 2020, 9:44am

Thanks for the tip. I’ll check it out.

Once it is up, there will be a laundry list of things to improve. High on that list will be parallel computing, maybe using GPUs.

chakravala · June 29, 2020, 3:09pm

Interesting, I’ve also been working on a variable coefficient Poisson equation implementation using a variety of different methods. Will also look into the GMG stuff now as well. My implementation will work in any number of dimensions, since I am using Grassmann.jl my geometric foundation allows me to write fast geometric algorithms which automatically generalize to any dimension.

weymouth · June 29, 2020, 3:38pm

Neat.

The Poisson Eq is trivial in any dimension on a Cartesian grid (which I’m using). It’s just a matter of looping through the faces of each element. I haven’t found an application in my work for N>3 yet, but I’ve got the GMG solver ready.

LaurentPlagne · July 6, 2020, 7:53pm

Hi,

I read again your code and I found the coding style really inspiring:

for I ∈ inside(a)
    a[I] = sum(@inbounds(b[J]) for J ∈ near(I))
end

I try to mimic it in my code but I need more practice…

One thing that you may include in the future is a red-black version of the GS! (allowing efficient vectorization and //ization by breaking dependencies with very little (or no) impact on GMG convergence). It may reduce the GS! time by a large (>4 factor). In a second step, @elrod suggested to adapt the underlying data layout (to have red and black cells contiguous which also allow for a substantial acceleration x3).

Can’t wait to see your package published

weymouth · July 7, 2020, 11:30am

I’ll try to publish it this week. I’ve looked at PkgSkeleton by @Tamas_Papp and it generates a folder structure and copies through files. How do I keep my current repository https://github.com/weymouth/WaterLily (I’ve made it public)?

Tamas_Papp · July 7, 2020, 11:42am

For existing packages, I would recommend

starting with a clean git status (your own code committed),
generating the package structure in another directory,
copying everything over, and manually select what to keep (using git).

Please make sure you back up your work before doing anything — PkgSkeleton has some protection against data loss/overwriting, but better safe than sorry.

weymouth · July 7, 2020, 9:19pm

Ok, done. I don’t think it got everything right, but it’s a start.

LaurentPlagne · July 8, 2020, 8:52am

That is great !
Though, I may have missed something but I can’t obtain good performances using CartesianIndexes and your compact notations ( near, inside…).
The following loop based function:

function res!(rlc,rlf)
    (nrows,ncols) = size(rlc)
    @inbounds @simd for j=2:ncols-1
        fj=2j-2
        for i=2:nrows-1
            fi=2i-2
             rlc[i,j]=(rlf[fi,fj]+rlf[fi+1,fj]+rlf[fi,fj+1]+rlf[fi+1,fj+1])
        end
    end
end

is 6-8 time faster on my laptop than:

function res_ci!(rlc,rlf)
    @fastmath @inbounds @simd for I ∈ inside(rlc)
            rlc[I]=sum(@inbounds(rlf[J]) for J ∈ near(I))
    end
end

Do you have the same ?
The following test MWE:

MWE

using BenchmarkTools
using LinearAlgebra
using Random

@inline CI(a...) = CartesianIndex(a...)
@inline δ(a,d::Int) = CI(ntuple(i -> i==a ? 1 : 0, d))
@inline δ(a,I::CartesianIndex{N}) where {N} = δ(a,N)
@inline CR(a...) = CartesianIndices(a...)
@inline inside(M::NTuple{N,Int}) where {N} = CR(ntuple(i-> 2:M[i]-1,N))
@inline inside(a::Array; reverse::Bool=false) =
        reverse ? Iterators.reverse(inside(size(a))) : inside(size(a))
@inline inside_u(N::NTuple{n,T}) where {n,T} = CR(ntuple(i->2:N[i],n-1))
@inline near(I::CartesianIndex,a=0) = (2I-2oneunit(I)):(2I-oneunit(I)-δ(a,I))

function res!(rlc,rlf)
    (nrows,ncols) = size(rlc)
    @inbounds @simd for j=2:ncols-1
        fj=2j-2
        for i=2:nrows-1
            fi=2i-2
             rlc[i,j]=
(rlf[fi,fj]+rlf[fi+1,fj]+rlf[fi,fj+1]+rlf[fi+1,fj+1])
        end
    end
end

function res_ci!(rlc,rlf)
    # @inside rlc[I] = sum(@inbounds(rlf[J]) for J ∈ near(I))
    @fastmath @inbounds @simd for I ∈ inside(rlc)
            rlc[I]=sum(@inbounds(rlf[J]) for J ∈ near(I))
    end
end

#function that call the benchmarks
function benchmark_res(nf)
    @assert iseven(nf)
    # initialize two 2D arrays (fine and coarse)
    nc=nf>>>1
    Random.seed!(1234);

    rlf=rand(nf+2,nf+2)
    rlc=rand(nc+2,nc+2)
    ih2=rand()
    # the following blocks checks that res and res_ci produce same results
    rlc_ci=deepcopy(rlc)
    res!(rlc,rlf)
    res_ci!(rlc_ci,rlf)
    # @show @code_native debuginfo=:none res!(rlc,rlf)
    # @show @code_native debuginfo=:none res_ci!(rlc,rlf)
    residual=norm(rlc-rlc_ci)
    # #The actual timings
    tres=@belapsed res!($rlc,$rlf)
    tres_ci=@belapsed res_ci!($rlc,$rlf)
    @show "restrict",nf,residual,tres,tres_ci,tres_ci/tres
    nothing
end

for n in [32,256]
        benchmark_res(n)
end

returns:

(nf, tres, tres_ci, tres_ci / tres) = (32, 1.6225823451910409e-7, 1.1553e-6, 7.120131705019731)
(nf, tres, tres_ci, tres_ci / tres) = (256, 1.1933e-5, 7.3877e-5, 6.190982988351629)

weymouth · July 8, 2020, 2:11pm

There is certainly a performance hit using CartesianIndex in 2D. In 3D, I don’t see the same slowdown. Maybe this is because looping through high dimensional arrays has a penalty similar to using CartesianIndex… Maybe because @simd can be used for the single CartesianIndex loop, but only one of the 3D loops?

julia> @btime WaterLily.restrict!(a,b) setup=((a,b)=(zeros(34,34),rand(66,66)))
  5.150 μs (0 allocations: 0 bytes)
julia> @btime res2D!(a,b) setup=((a,b)=(zeros(34,34),rand(66,66)))
  698.604 ns (0 allocations: 0 bytes)
julia> @btime res3D!(a,b) setup=((a,b)=(zeros(34,34,1),rand(66,66,1)))
  8.567 μs (0 allocations: 0 bytes)
julia> @btime WaterLily.restrict!(a,b) setup=((a,b)=(zeros(34,34,34),rand(66,66,66)))
  290.300 μs (0 allocations: 0 bytes)
julia> @btime res3D!(a,b) setup=((a,b)=(zeros(34,34,34),rand(66,66,66)))
  339.300 μs (0 allocations: 0 bytes)

res2D!() is your code, and res3D!() is below. It’s requires the arrays be 3D, but if the size of the last dimension is 1, it does the 2D restriction. However, it’s slow in both 2D and 3D!

Summary

inside2(n) = min(2,n):max(1,n-1)
near2(i) = 2i-2:2i-1
near2(i,n) = ifelse(n==1,1:1,near2(i))
@fastmath function res3D!(a,b)
    (ni,nj,nk) = size(a)
    @inbounds for i ∈ inside2(ni), j ∈ inside2(nj), k ∈ inside2(nk)
        σ = 0.
        @inbounds for I ∈ near2(i), J ∈ near2(j), K ∈ near2(k,nk)
            σ+= b[I,J,K]
        end
        a[i,j,k] = 0.
    end
end

weymouth · July 8, 2020, 2:57pm

I guess the options are:

Somehow improve CartesianIndex for 2D arrays
Profile the code and judiciously write special 2D only versions of key functions which get used based on multiple dispatch. Note that restrict!() is not one of them, but mult() is likely to be.

chakravala · July 8, 2020, 3:06pm

In my implementation, I don’t use cartesian indices, instead I represent the cartesian mesh using a finite element mesh (triangles, tetrahedra, cell complex). They are in a particular structure, which allows me to extract the cartesian structure of the mesh without resorting to higher dimensional arrays. Thus, I don’t need to iterate over higher dimensional arrays and only work with a single index for the mesh. My package FlowGeometry.jl is my prototype mesh generation for this, but doesn’t include solution methods there. Using this representation, might seem strange but works better for my purposes.

I studied a lot of number theory before, so instead of using cartesian indices I use an index notation based on modular arithmetic and equivalence classes.

In the future FlowGeometry package will provide a comprehensive (structured) mesh interface for this. Right now, it’s prototype.

Topic		Replies	Views
Trying to speed-up Laplace Equation Solver Performance	36	1870	August 1, 2020
Understanding Julia performance in simple finite difference code New to Julia	38	4559	June 7, 2018
@inbounds: is the compiler now so smart that this is no longer necessary? Performance	33	3012	July 16, 2018
Partial trace of multidimensional arrays General Usage	6	2765	January 21, 2017
Unusual slow Julia code compared to C++? Performance	21	4568	July 30, 2019

Multigrid solver prototype (GMG) and simple Lid Cavity solver

A simple CFD application of the solver (Lid Cavity).

Related topics