[ANN] MoleculeHub: A set of cheminformatics tools in Julia

In an effort to bring cheminformatics to Julia, I’ve launched MoleculeHub which focuses on tools for working with small molecules.
Right now, there’s:

  • MoleculeFlow.jl - a cheminformatics tool that wraps RDKit and builds on top of it.
  • OpenBabel.jl - a way to call Open Babel library from Julia
  • MoleculeDatasets.jl - a simple way to download cheminformatics datasets.
  • A library for more advanced molecular visualization (work in progress)
  • A library with common and lesser-known substructure filters (work in progress)

The idea is to bring the some fundamental cheminformatics tools to Julia which would allow to do more interesting things in it.

19 Likes

This looks great! I may actually make use of it very soon: just yesterday I was looking into pycalls for Mordred.
How does this compare to MolecularGraph.jl? Do the two libraries integrate or have some common underlying data structure?
What are your plans for future features? I’m not very interested in the visualization aspect but I may help implementing other functionalities such as descriptors or substructures.

You mean how MoleculeFlow.jl compares to MolecularGraph.jl? The other two are very different packages.

There are quite a few differences:

  • MolecularGraph.jl uses a graph as it’s base representation, whereas MoleculeFlow.jl uses a simple struct to hold information about a molecule. There’s the option to convert to a graph if one desires but it’s not as feature-rich. There’s no integration between two libraries, at least not currently.

  • MolecularGraph.jl chooses to go more “low level” when it comes to source libraries, such as inchilib or coordgen or RDKit MinimalLib, which severely limits the available functionality, unless one decides to do write everything from zero. MoleculeFlow.jl wraps RDKit as its sole source library, which gives it access to everything RDKit has to offer. I’ve ported the majority of functionality most people should ever need but RDKit is huge, so there’s plenty to add if one desires. Additionally, doing certain things in RDKit requires knowing some of its more “arcane” syntax, so MoleculeFlow.jl is there to make it simpler for the user.

I don’t really have a development roadmap, I just like Julia and I’ve spent my entire PhD working with RDKit, so I figured this would be something fun to do. I guess you could say that I’m making tools that I wish I had when I first started.

The plan for now is to finish the visualization and filtering libraries (pretty close on both), add some polish here and there, port some niche functionality I had to develop in RDKit way back and that would be more or less it. Anything after that would depend on my mood/availability and/or user feedback, if the tools gain any traction. There’s ML, quantum chem, docking, whatever else. I know SciML is Julia’s thing, so maybe there will be opportunities in that domain. It would be cool to integrate with and/or develop something on top of this that python can’t provide.

5 Likes

I see, this seems a great approach to me. While I do like the MolecularGraph idea of implementing everything (or almost) in Julia, I feel it is more pragmatic to start with RDKit so that at least we can work on real projects.
One of the main reasons I don’t work mostly in Julia is the lack of these functionalities, having an easy way to hook into RDKit may make this much easier.
I guess that if your library bases the molecular information on RDKit it should also be simple to pass information to MolecularGraph through the RDKit object.

I’ll try to set aside some time in these days to take a deep look at what you have implemented. If you need help with something let me know and I’ll be happy to help out.

Thank you for this - it looks extremely useful.

Years ago there was a chemometrics person that was very active in the community and wrote a bunch of packages. At some point he stepped away, but a lot of his work is still around.

It’s not my field, so I have no idea if it’s helpful at all, but might be worth taking a look at GitHub - caseykneale/ChemometricsTools.jl: A collection of tools for chemometrics and machine learning written in Julia.

4 Likes

CrystalInfoFramework.jl may be of interest. OpenBabel already supports input of data in CIF and mmCIF formats, so CrystalInfoFramework.jl is probably more useful if you want native Julia parsing. For example, a table of atomic positions with additional information (atom type etc.) in a CIF/mmCIF file is returned as a DataFrame, albeit in fractional coordinates instead of the Cartesian coordinates OpenBabel conveniently provides.

1 Like

BioStructures.jl also parses mmCIF into a structured object. And PDBTools.jl reads them into a custom array of atoms.

2 Likes

MoleculeView is here

A tool that allows you to explore molecular data more visually, like in the examples below. Should be in the registry in a couple of days.

aaa
scatter_demo_cat
grid_demo

14 Likes

I like it, great project and of course I “starred” it.

I do not want to capture the thread, but I also started a related project that is a 3D viewer for Chemfiles: ChemfilesViewer.jl

If there is interest, I might be able to improve it and also expand the integrations.

4 Likes

About Chemometrics, this one is active:

(But this is not about representation of molecules)

1 Like

MoleculeView grid now supports range selection for numerical variables:

aaa

5 Likes

Some more updates:

Development of MoleculeFlow.jl and MoleculeView.jl has reached a point with which I’m fairly happy, so the focus for both will be generally bug fixing.

MoleculeView.jl has added functionality that is summarized by the gif below:

aaa

MoleculeFlow.jl now has 200+ functions ported, which covers maybe ~70% of RDKit (a number I pulled out of the air). Refer to its API docs to get a sense of what’s covered.

3 Likes

MoleculeScreen is here (pending registration), which implements some common structural filters (PAINS, Lipinski, etc).

This more or less concludes “phase 1” of MoleculeHub development. The project is now in maintenance/bug fixing mode until I have more time/desire to put work into it, or until people get interested in contributing.

If people run into bugs, please submit an issue in the appropriate repo.

3 Likes