Dear all. After a little more than a year after I started seriously learning Julia, I finished what will be soon the 1.0 version of the package I proposed myself to implement while learning this great language.
It is a niche package for the Chemistry/Molecular Dynamics simulation field, which is used for understanding how molecules interact in solution, particularly molecules of complex shapes (geometries), as are proteins, polymers, DNA, etc.
The repository is here:
and the Docs are here: Introduction · ComplexMixtures.jl
It was not easy to find a name for this package, as anything carrying “Complex” or “Solution” would be much more confusing than what I finally chose, probably. I hope the name is at least something that does not overlap with other domains too much.
The core of what this package does is to compute “minimum-distance distribution functions”. These are measures of the accumulation or depletion of a molecule in each region of a solution, relative to the average density of that molecule in the solution. Interactions between molecules cause local density variations, and the goal of these distribution functions is to probe these density changes and understand them from a molecular and chemical perspective.
For the ones that are more within the field, these are not regular radial distribution functions. The package computes the distribution functions of the minimum-distance between solute and solvent molecules. This allows the understanding of the solvation from a “solvation-shell” perspective which is very natural, and not dependent on the shape of the molecules. As with other distribution functions, thermodynamic properties can be obtained by means of solution theory.
Computing efficiently these minimum-distance distributions is tricky. One has to compute which is the minimum distance between all the atoms of two molecules which might be large. Doing this efficiently involves using linked cells and linked lists, sorting lists of distances while carrying the indexes, etc. This cannot be easily done (at least I can’t) using a language that does not allow one to write our own optimal loops. That is, implementing this in some of the interpreted languates was always out of question.
I have an original implementation of this in Fortran. The Julia implementation turned out to be (because of many improvements that I could do with the tooling available) 30-40% faster than that running in serial, and it also very easy to parallelize (that was almost trivial, except that one has to read a huge trajectory file serially while launching parallel calculations for each frame of the trajectory).
I am quite happy with the result, and we are already using the package routinely in our group. A publication will be out sometime soon, with which I will release the 1.0 version of the package.
Finally, I am very grateful to all the help I received in this forum. You will find my posts trying to understand assignment and mutation in Julia not long time ago. I wont mark people here, but I can easily mention skoffer, henrique becker, tamas, rdeis, elrod, stevengj, DNF, dpsanders, oscar_smith, and many many others without whom I am sure I could not learn that fast everything I learned last year, not only about Julia, but about computer programming in general.
Last, for developing that package I also developed some other tools that might be even more generally useful, particularly the https://m3g.github.io/PDBTools.jl/stable/ package. But that will have its own announcement when I finish adding everything I think it deserves.
And a pretty picture built with the package, because everybody likes pretty pictures:
This shows how a solvent (in this case Glycerol) accumulates on the vicinity of some amino acid residues of a protein. Glycerol protects proteins from denaturation, and has industrial and medical applications because of that.