Implementing Crystal Graph CNN in Julia

Hi everyone! I’d like to implement Crystal Graph Convolutional Neural Networks (CGCNNs) in Julia, in particular using the GeometricFlux package. CGCNN’s are a method in computational materials science for representing crystal structures as undirected graphs and then predicting materials properties by training graph convolutional neural nets on data from experiments, online repositories, etc. In particular, there are features for each node (atom) in the graph corresponding to various properties such as atomic number, ionization energy, etc. as well as features of the edges (bonds) of the graph, in this case just the length of the bond.

My starting point is the Python implementation here. I have a few questions related to featurization and was directed that folks in this community might be able to offer answers.

In the Python implementation, the feature vectors are built by discretizing each property to be represented into bins. For categorial data (e.g. is the atom in the s-, p-, d-, or f-blocks of the periodic table?) this makes perfect sense, but for continuous variables (e.g. electronegativity or atomic radius) it sacrifices information. However, this “binning” allows the atomic feature vector to be represented as a long vector of zeros and ones rather than a shorter vector of floats. Using default settings, features are binned into roughly ten categories, so in particular the vector is ~10x longer, meaning the weight matrices have ~100x as many entries compared to a float-based implementation.

So my first question is, is there a big efficiency boost to be gained from storing/manipulating these atomic features in this way that’s worth longer vectors/larger matrices and also the information loss due to coarse-graining? Or would it make more sense to just use the float values in a Julia implementation?

On a related note, the featurization of the graph edges (bonds) is also done in a very particular way. The bonds only have one distinct feature – their length. However, they are also featurized in a binning process, which is additionally passed through a Gaussian filter such that the vector ends up looking like mostly zeros and then at the slots corresponding to the bond length and those near it, growing and then shrinking (float) values. I assume something about this representation improves the way in which information about neighboring atoms/nodes “propagates” along graph edges/bonds, but I can’t exactly understand why. So my second question is, can someone explain this and convince me whether I should keep this in my Julia implementation as well?

Thanks in advance! Please let me know if anything is unclear (I can add code for instance, but this felt like a more conceptual question so it didn’t seem necessary). For more information on CGCNN’s, you can see the non-paywalled preprint paper here: https://arxiv.org/pdf/1710.10324.pdf (it’s published in Physical Review Letters so you can find the final version there if you have academic credentials).

Unfortunately understanding in this area is highly empirical, so it is difficult to give strong answers!

Nevertheless, I would expect extremely poor performance if you just generate a very dense feature vector of ‘float’ values. Binning into histograms is totally analogous to the ‘one-hot’ encoding you’ve seen in more standard machine learning. Expecting a set of neurons to learn quite a complicated and highly non-linear function from a single float is asking a bit much, whereas once it’s binned you can have e.g. some neurons that trigger on close-contacts and do something accordingly.
There’s nothing stopping you composing e.g. the 10 binned values, and then the actual floating point value, as your feature vector.

Another perspective is that the overall size of your model is proportional to the size of these feature vectors. Neural network models for materials seem to saturate with few hidden layers (for instance, Fig S2b in the PRL you linked shows that the performance saturates with 2 convolution layers - barely ‘deep’ learning! The same is also empirically observed even with conceptually more simple ‘dense’ networks.), so you need to have fairly expanded feature vectors to have much predictive power.

For the bonds / graph edges, it may be that this is necessary to work with their Eqn 5. They construct some kind of gated architecture, so that set of weights has to interpret how much two neighbours are interacting, and so some kind of larger bond feature vector is probably needed here.

In terms of implementation, it would be most useful to not just have these feature functions implemented, but also have your code such that you could choose and combine these features at run time. This seems to have been what the authors of the PRL also had, and tried different size combinations of features as a hyper parameter optimisation.

Thanks, this is all really useful perspective! The analogy to one-hot encoding makes perfect sense.

And yes, I was definitely planning on having a “modular” featurization scheme; I’m certainly keen to explore which atomic features actually matter in the first place as well as play with binning densities and other meta-, hyper-, etc.- parameters :smiley:

Very interested to see any code!
It should be possible to make something quite technically sweet with Julia’s ability to abstract the complexity.

Having the node information directly from the labels within a MetaGraph (i.e. this issue taken to completion https://github.com/yuehhua/GeometricFlux.jl/issues/6 ) might be a necessarily prerequisite for a smooth implementation. But equally, packing the feature vectors in your code ‘by hand’ might keep the necessary complexity exposed to the researcher / user.

A lot of the graph-based learning literature is applied to enormous graphs (e.g. social networks), so quite a different set of limits from the relatively small molecular / solid-state graphs.