Calculating bandstructure of graphene

The simplest and most reliable way to get you DFT from Julia is via PyCall and ASE. I’m maintaining a simple interface ASE.jl if this is of any use. From ASE you can of course link any DFT code. GPAW seems particularly easy to use. I wouldn’t mind expanding the ASE.jl interface to provide convenience wrappers for electronic structure functionality. At the moment it is primarily targeting atomistic mechanics.

Sorry no links since I’m on the Phone, should be easy to find those packages thoug.

1 Like

If you are not familiar with bandstructure calculation with DFT, I think it is good to start with tight binding method (especially for 2d materials) as others have been mentioned before. You might want to port http://physics.rutgers.edu/pythtb/ to Julia as an exercise (oh, and compare the performance :grin:).

DFT calculation is rather resource-consuming and many packages are written in Fortran.

As the author of PWDFT.jl, I am still struggling to make this package better although if you want you can do the calculation anyway (using plane wave basis set and HGH psp)

1 Like

Here is what I get using PWDFT

Using 15 Ha cutoff and 5 bands (2 atoms per unit cell plus 1 empty state)

2 Likes

Looks amazing :slight_smile: I need to get two additional levels from the top and add KH path. Would you mind to share the code you used? Documentation seems to be lacking at this point.

Which implicitely raises the question about parallelization. Briefly looking into PWDFT.jl there is no parallelization yet ? What would be the approach with julia ? Does MPI.jl work ? Or is the strategy pure threading ?

My understanding is that PW codes rely essentially on parallel FFT.

Am not sure what to make of that remark.

FFT primarily for the Poisson solver (or GW if you want to do GW), that is certainly true. But that is usually done with MPI (because usually you overflow a single nodes memory at some point even ignoring cores per node and speed) by distributing explicitely one fft dimension across nodes, so not hidden in a library like a MPI parallel fftw.

On top of course a Hamiltonian matrix must be diagonalized, so usually you start thinking about parallel block Davidson, LOPCG and similar (on top of fast BLAS/Lapack), eventually based on blacs/ScaLapack (which is not the right toolbox) or homegrown. And that certainly is not a fft.

So again my question: what would be a technically viable parallelization strategy with julia for typical electronic structure theory given the current approach of the language to parallelization and the current ecosystem ? Does anyone know ?

Because I do not see it. Parallelization appears to be generally more of an afterthought at the moment (ok, that is true for most other languages, too) but I did not hear of any (large scale) codes using julia and serious MPI for either linear algebra or fft where you spend much time using a high performance network (e.g. aries, infiniband, omnipath) and its software stack underneath.
What I see in the documentation is some basically thread parallel approach or spawning of independent workers. Has anyone experience with MPI.jl ? Is it ready/ stable/ performant ?

1 Like

MPI FFTW parallelism is not hidden completely in a library, precisely because in a distributed memory setting the user has to be aware of the data distribution. If you look at the MPI section of the FFTW manual, you’ll see that the caller is required to explicitly create distributed arrays.

DistributedArrays.jl makes this somewhat easier, but it doesn’t have built-in FFT support yet.

On top of course a Hamiltonian matrix must be diagonalized, so usually you start thinking about parallel block Davidson, LOPCG and similar (on top of fast BLAS/Lapack), eventually based on blacs/ScaLapack (which is not the right toolbox) or homegrown. And that certainly is not a fft.

Parallelizing these algorithms is much easier than parallelizing the FFT/Poisson solver, because the only communication required is typically a reduction (e.g. for parallel dot products), which is trivial with MPI.jl or similar. (Indeed, existing iterative solvers in Julia may well work as-is with DistributedArrays, because the latter already supports norm and dot and similar linear-algebra operations.)

1 Like

I have to admit that I never had to think about MPI fftw and its api. I only know for sure that vasp does not use it (licensing reasons? , although linking against a serial fftw is possible) and I believe that also pwscf (quantum espresso) and gpaw do not use it. But anyway:

Yes. My point was that this is not fft and still can be time and communication expensive, so must not be forgotten even with plane-waves.

From your reply I assume that you think that MPI.jl is ready, but DistributedArrays.jl is unproven ? So I could conclude that there are attempts but there is no unified complete integrated and interoperable stack yet ? I am deviating from the plane-wave context I originally asked the question in, but a large scale parallel local orbital (i.e. siesta, also gpaw depending on the basis) or tight-binding code would for example usually require ScaLapack or ELPA.

That brings me back to my original question: what actually is the strategy for the plane-wave code of @f-fathurrahman if any ? The packages mentioned by @stevengj ?

More generally, are there any showcase traditional (non-GPU) large scale MPI linear algebra/fft application projects similar to say GPAW (python mixed with fortran/C for speed) in open development ? That is something I am missing, but probably the language is to new for this and hopefully the projects mentioned in this thread are the pioneers.

Well, google gives these slides for pwscf which show on page 27 (Broadwell row) if I understand them correctly that diagonalization can be as expensive as fft depending on the system, see slide 32 for a counterexample. I am not yet sure what they lumped up in “Other”.

Well, google gives these slides for pwscf which show on page 27 (Broadwell row) if I understand them correctly that diagonalization can be as expensive as fft depending on the system, see slide 32 for a counterexample. I am not yet sure what they lumped up in “Other”.

I worked on this for the abinit code a few years ago. For small systems, the FFTs (O(N^2 log N, with N the numbers of electrons) are usually the bottleneck. For larger ones, the diagonalization (O(N^3)) becomes the bottleneck. Somewhere between the two is the nonlocal pseudopotential, depending on how it’s implemented (O(N^2) or O(N^3)).

There are two nontrivial dimensions along which you want to parallelize the code: the bands, and the plane waves. The FFT is embarassingly parallel along the bands, and that usually leads to good scalability of that operation. The linear algebra (diagonalizations & co) usually stops scaling before the FFTs do (eg see Fig 9 of https://arxiv.org/pdf/1406.4350.pdf, although a bit biased by the fact that the nonlocal operator is O(N^3)).

Parallelizing these algorithms is much easier than parallelizing the FFT/Poisson solver, because the only communication required is typically a reduction (e.g. for parallel dot products), which is trivial with MPI.jl or similar. (Indeed, existing iterative solvers in Julia may well work as-is with DistributedArrays, because the latter already supports norm and dot and similar linear-algebra operations.)

It’s a bit more complicated than that, because those are block algorithms. Eg you need to diagonalize dense matrices, so need to distribute them and call scalapack or ELPA). I can provide more details if needed.

It’s easy enough to do manually with MPI. The dream would be to do it all with distributed arrays and library code (eg call IterativeSolvers with a DistributedArray), but that seems pretty tricky to me because of the different data distributions involved. Definitely something to try though. It might be also that, with today’s multicore machines, one can skip one level of data distribution in favor of shared memory, which would be simpler.

I have not tried MPI.jl but don’t see why there should be any problems, as MPI.jl just has to call the MPI primitives, which should have little interaction with the julia runtime and work just as well as in fortran/C.

Thank you for that insightful reply.

I also hope that and in principle you are right. But seeing is believing, especially for the whole stack of compiler (not only gcc, but ifort/icc ?), BLAS/Lapack (mkl ?) and ScaLapack :slight_smile:

I should mention @John_Gibson had similar concerns in Implementing parallel fluids code in Julia (parallel FFTW, MPI.jl, ...) - #34 by favba

1 Like

About the multiprocessing/threading being an afterthought at the moment, if I remember correctly at some point there will be a pretty nice threading effort that shows up coming from a research group in Cambridge Shared memory parallelism . Maybe that will make implementing parallel algorithms a bit more readily supported, although this would be on the multithreading level rather than the distributed level.
Very interesting thread.

Dear @Janis_Erdmanis, sorry for late reply.

It true that the documentation is still lacking and it is not yet ready for production.
I have not implemented fancy functionals such as the ones used in the mentioned paper.

My short term goal is to make the code as simple as possible and I have not done any (performance) benchmarks other than comparing the total energy value with ABINIT. I am also still learning to write good Julia programs.

For the moment you might want to read the file https://github.com/f-fathurrahman/PWDFT.jl/blob/master/sandbox/graphene.jl that I write to get some ideas about how I get the band structure. Band structure calculation is currently not my first priority, and I am still thinking about good work flow for doing this.

Feel free to modify the code to suit your needs.

PS: Sorry for the Python script to plot the band structure.

1 Like

Unfortunately parallelization is not yet implemented.

Probably the easiest one is to use DistributedArrays.jl for parallelization over kpoints and Kohn-Sham states.

I wish you all the best, regardless what you do this will be a lot of hard work on top of the amount of work you already did to get it working so far. I will certainly keep an eye open for this project.

1 Like

Thank you :slight_smile: In the meantime I found another paper where I was able to see additional two higher bands which seems to be enough for pointing out an example of possibility to realize Dirac exciton (with a very rough assumptions and estimations). I will certainly play around with it to get a feeling what it takes to make a band structure calculation and to actually understand what actually happened in this thread :smiley:

Do not worry too much, I derailed it a bit by accident if you mean that. Sorry.

The discussion is rather interesting and lively. I get to see why such calculations are hard and I get to know more about parallel computing with which I like to experiment on my local university cluster. And it is satisfying/inspiring to see that so many are actively working on solvers even for such a specialized domain :wink:

1 Like