[ANN] DFTK.jl - A pure-Julia implementation of density-functional theory

DFTK, the density-functional toolkit, is a Julia package @antoine-levitt and me started about a year ago and now we feel it is time to announce it to the community!

The aim of the package is to become a platform for methodological developments in density-functional theory (DFT), one of the most widespread methods for simulating electronic structures and properties of materials. This field is highly interdisciplinary in nature, where advances are often the result from devising
chemically and physically sound models, using mathematical insight for suggesting and stable algorithms
and then scaling the code to the high-performance regime. Therefore we believe that compared to the traditional approach of wrapping low-level FORTRAN and C++ cores in Python, Julia is better-suited as a language to support our aims.

In its current stage DFTK focuses on simulating electronic properties of extended systems such as crystals or surfaces. We support already a sizable set of standard models and approaches (plane-wave basis sets, LDA/GGA functionals, GTH pseudopotentials, self-consistent field methods, direct minimisation, various forms of mixing) in only about 5k lines. This has only been possible by being able to fix and match the plenty of good packages in the Julia ecosystem (so thanks to all of you!).

Interfaces to established Python frameworks (ASE, pymatgen, abipy…) are available, such that DFTK also integrates to the world beyond Julia. See our documentation for a full list of features or a tutorial and examples to get started.

As the code is intended as a platform for multidisciplinary collaboration, any questions, suggestions or additions from the community are welcome!

39 Likes

Thank you for this work! I will definitely keep an eye on its future progress (and might even contribute).

Few questions:

  • You are using libxc, which does not work with GPUs. Do you have any plans of making DFTK to work with GPUs?

  • Support for differential programming? Can I use Zygote to take gradient from lets say energy?

Thanks for the feedback and of course any contribution is welcome. Just get in touch!

We have started looking into GPUs and did a few small experiments with it, but due to lack of time this has so far not made it into a production-ready outcome. The current status is that tackling a few super simple problems (like a free-electron model) has been achieved on GPUs with DFTK in a student project a few months back. Getting this expanded to the full DFT case is in principle possible, since we have kept an eye on type-generic code, but certainly further adaptions will be needed. Indeed we use libxc for most functionals, but for simple ones like LDA we have a pure-Julia fallback, which support GPUs out of the box. Additionally libxc has started initial developments to roll out GPU support in the library. That’s still experimental, but might be an option for the future.

Differential programming: That would be nice! We definitely have that on the (rather long) agenda. Right now the obstacle is that Zygote does not do too well with mutable objects and structs, which we use quite a bit to structure our code. Therefore it’s a bit further down the priorities for now. But any attempts in this direction would be greatly appreciated from our end!

1 Like

Any contribution is very warmly welcome! Don’t hesitate to open an issue or contact us if anything is unclear!

To expand on Zygote: believe me we would love that, but there’s just no way just throwing Zygote at something this complicated just works. There are two reasons for this.

The first and most fundamental is that reverse-mode differentiating is just complicated to implement, since you need to run the program in reverse. So for an iterative solver that would mean keeping the full history in memory, and then running through it in reverse. That would blow up memory pretty quickly, and so you would need to implement mitigation strategies like checkpointing. Also for solvers very sensitive to numerical noise (like our default LOBPCG solver), it would probably just explode. Fortunately, there is a trick. If x is defined as a function of y through the equations F(x,y)=0, then J_x F dx = J_y F dy, and you can just solve iteratively that linear system (if you’re familiar with the field: for the SCF it’s the Dyson equation, and for the linear eigenvalue problem it’s the Sternheimer equations). You do that by defining custom adjoints in Zygote.

The second is more technical, it’s that Zygote just has too many issues and limitations right now. To implement the above program we’d need to spend significant time adapting our codebase to Zygote’s wishes. I’m hoping that will change someday.

So assuming we can work around these two issues, we could do cool things like take arbitrary gradients, get higher order derivatives, etc. I believe this kind of methodology could potentially change the field (right now stupid amounts of time are spent just coding derivatives). This is all new territory, I don’t believe anybody has ever reverse diffed through an electronic structure program, although there was a paper on some people forward diffing through a Hartree-Fock calculation.

Of course for the energy you don’t have to go through all this, you can just use Hellman-Feynman. Then everything is explicit and it’s just a question of coding up the derivatives you want. I did forces (using ForwardDiff at times when we got lazy, like here: https://github.com/JuliaMolSim/DFTK.jl/blob/master/src/terms/ewald.jl#L201), but did not yet get around to do stresses. Using AD there (either ForwardDiff or Zygote) is a much more realistic prospect. Doing that is would be a fun little project; it’s on my TODO list (but at a pretty low priority, so no guarantees it’ll ever get done).

5 Likes

Thank you for the clarifications.

I also believe that Differential programing will be a major change in electronic structure calculations, which was the main reason why I asked it.

The work already done is presented in this article https://pubs.acs.org/doi/10.1021/acscentsci.7b00586 and the code for the program can be found from here https://github.com/aspuru-guzik-group/DiffiQult It is in practice a proof of concept program that goes trough problems and limitations in implementation and how to solve them.

The main argument for differential programming has been that it would allow easy implementation of gradients. Which would be, of course, a huge improvement. But in my opinion, the most important thing is that it would allow us to do things, we cannot do with traditional programs. Like in DFT side, we could implement neural network functionals that could be trained against energy, gradients and density. Which would be, in theory at least, close to ideal functional, if trained against large enough reference data. A program able to do this, would most likely be a killer app for differential programming.

For that GPU part. Libxc does seem to have cuda support for future 5.0 version. But truth to be told, a pure Julia version would be much more flexible and would not be limited to nvidia GPUs. I did suggest a some time a go to Susi Lehtola (libxc dev) that with Julia we could make xc-functionals that are vectorized, could be run on GPUs and gradients would be implemented with AD, but he didn’t seem to understand, what I was really after. I will poke him again on this matter, when I start in my post-doc at Helsinki this autumn.

To be clear, I’m just thinking of a possibility of creating a pure Julia (test) version libxc that would use all the nice Julia features, like (loop)vectorization, GPUs and differential programming.

That paper is what I was referencing above. They did the “easy part” which is the forward diff. The “hard part” (and the most interesting one) is the reverse diff. They give the reason that reverse diff doesn’t work for repeated eigenvalues, which is technically true if implemented naively but can be worked around easily by writing up custom adjoints that appropriately project onto the orthogonal of the occupied eigenvectors; it’s probably a very simple modification of their code. It’s a trick that doesn’t seem to be widely known, at least in the chemistry community.

Regarding libxc, it’s really not an issue: libxc is useful as a giant library of all functionals ever created, but in practice you end up using a very small subset, which you can easily implement. It’s also a tiny fraction of computing time (at least in plane-wave codes)

DiffEq defines adjoints for time stepping, so if you mix that with an adjoint for ewald, then you’d be 99% of the way there.

I completely agree with your wish to go for a pure-Julia version of libxc and for sure that would be an interesting endeavour. I would also expect Julia’s simplicity could shine there. Libxc does not have a bad structure and still understanding where things happen in the code is not super easy and the main computational routines are auto-generated and not at all readable. I have thought about making a library enwrapping Libxc.jl at some point, which essentially provides generic pure-Julia fallbacks to common functionals, kind of extending what we did in DFTK for LDA. I think that is a good way to start such a project without dropping support the vast set of functionals already available in libxc.

What has kept me from getting started on this was firstly the computational point of view, where as @antoine-levitt said, one probably does not win so much (at least in plane-wave DFT) and secondly from talking to the libxc people getting the evaluations of (some) functionals numerically stable and then similarly at least first and second derivatives sounds like quite a challenge. This is talking Float64 of course, but to really have the flexibility of Julia and make full use of GPUs you would want the same at Float32 and ideally any other type as well … so yeah, even in a case-by-case basis probably not an easy task.