[ANN] ComplexMixtures.jl: A package to study solute-solvent interactions of molecules of complex shape

Dear all. After a little more than a year after I started seriously learning Julia, I finished what will be soon the 1.0 version of the package I proposed myself to implement while learning this great language.

It is a niche package for the Chemistry/Molecular Dynamics simulation field, which is used for understanding how molecules interact in solution, particularly molecules of complex shapes (geometries), as are proteins, polymers, DNA, etc.

The repository is here:

and the Docs are here: Introduction · ComplexMixtures.jl

It was not easy to find a name for this package, as anything carrying “Complex” or “Solution” would be much more confusing than what I finally chose, probably. I hope the name is at least something that does not overlap with other domains too much.

The core of what this package does is to compute “minimum-distance distribution functions”. These are measures of the accumulation or depletion of a molecule in each region of a solution, relative to the average density of that molecule in the solution. Interactions between molecules cause local density variations, and the goal of these distribution functions is to probe these density changes and understand them from a molecular and chemical perspective.

For the ones that are more within the field, these are not regular radial distribution functions. The package computes the distribution functions of the minimum-distance between solute and solvent molecules. This allows the understanding of the solvation from a “solvation-shell” perspective which is very natural, and not dependent on the shape of the molecules. As with other distribution functions, thermodynamic properties can be obtained by means of solution theory.

Computing efficiently these minimum-distance distributions is tricky. One has to compute which is the minimum distance between all the atoms of two molecules which might be large. Doing this efficiently involves using linked cells and linked lists, sorting lists of distances while carrying the indexes, etc. This cannot be easily done (at least I can’t) using a language that does not allow one to write our own optimal loops. That is, implementing this in some of the interpreted languates was always out of question.

I have an original implementation of this in Fortran. The Julia implementation turned out to be (because of many improvements that I could do with the tooling available) 30-40% faster than that running in serial, and it also very easy to parallelize (that was almost trivial, except that one has to read a huge trajectory file serially while launching parallel calculations for each frame of the trajectory).

I am quite happy with the result, and we are already using the package routinely in our group. A publication will be out sometime soon, with which I will release the 1.0 version of the package.

Finally, I am very grateful to all the help I received in this forum. You will find my posts trying to understand assignment and mutation in Julia not long time ago. I wont mark people here, but I can easily mention skoffer, henrique becker, tamas, rdeis, elrod, stevengj, DNF, dpsanders, oscar_smith, and many many others without whom I am sure I could not learn that fast everything I learned last year, not only about Julia, but about computer programming in general.

Last, for developing that package I also developed some other tools that might be even more generally useful, particularly the https://m3g.github.io/PDBTools.jl/stable/ package. But that will have its own announcement when I finish adding everything I think it deserves.

And a pretty picture built with the package, because everybody likes pretty pictures:

This shows how a solvent (in this case Glycerol) accumulates on the vicinity of some amino acid residues of a protein. Glycerol protects proteins from denaturation, and has industrial and medical applications because of that.

26 Likes

For anyone that might be interested, the paper corresponding to this package is now published:

ComplexMixtures.jl: Investigating the structure of solutions of complex-shaped molecules from a solvent-shell perspective. J. Mol. Liq. 117945, 2021. [Full Text]

A temporary link for the full text is provided by the elsevier: https://authors.elsevier.com/a/1e6pQc8qpSw6E (valid until Jan 9, 2022).

Abstract

Distribution functions are used to investigate the interactions between the components of condensed-phase systems, while allowing the computation of thermodynamic properties that can be probed experimentally. Radial distribution functions are the most fundamental and easily understood of these distributions, but fail to provide a molecular picture of the interactions when one or all species have complex shapes. On the other hand, regardless of the complexity of the molecular structures involved, minimum-distance distribution functions (MDDFs) can provide a molecular viewpoint on solute–solvent contacts. Here, we describe the ComplexMixtures.jl package, which provides a practical implementation of MDDFs and corresponding Kirkwood-Buff integrals to analyze Molecular Dynamics and Monte-Carlo simulations. Examples are provided for the study of macromolecules in solutions of multiple cosolvents, homogeneous systems, polymer solvation by organic solvents and lipid bilayer interactions with disruptive agents. The distribution functions can be examined using tools to assess the contributions of each atom, group of atoms, and amino acid residues, for example. ComplexMixtures.jl is free software and is compatible with the most common molecular simulation trajectory formats. The software is available as a Julia package with a comprehensive documentation at: http://m3g.iqm.unicamp.br/ComplexMixtures.

11 Likes

May I ask how the nice Fig. 4 has been produced? Thank you.

This one? Which part? The top image is just a picture of the molecular system obtained with VMD. The molecular structures in the bottom were built with Chemtools.

(I’m actually more proud of the other figures, with the plots, all built with Plots and GR - all scripts available here).

2 Likes

Yes, the top 3D image is amazing in complexity.

Impressive paper.

1 Like