[ANN] OpenSMILES.jl a SMILES parser for Julia!

So if you are a chemist, biochemist, or someone working in those fields you may know what SMILES is. If not it’s basically a string representation for chemicals that preserves their connectivity and implied characteristics by physical laws. OpenSMILES.jl(GitHub - caseykneale/OpenSMILES.jl: OpenSMILES parser in Julia) does exactly that, it allows a user to parse SMILES strings into LightGraphs.jl graphs!

Its useful for chemical datamining, QSAR, drug discovery, chemoinformatics, visualization, scraping, AI/ML, etc. Or just for looking at somewhat pretty pictures derived entirely from strings! For example here is tryptophan(the amino acid in turkey that is fabled to make people sleepy)
SMILES representation: “C1=CC=C2C(=C1)C(=CN2)CC(C(=O)O)N”

OpenSMILES.jl graph plot:

So is this package perfect? Nope. Does it handle Chirality? Not yet? Will it ever? If you contribute sure, otherwise maybe not.

Are there likely bugs/issues? Maybe. Will I ever know? If you use it and report them yes!

It’s an early release but I think it has some serious utility and is a valueable contribution to JuliaChemistry which is an in the works Domain (discussion here: Should there be a Chemistry Domain/Organization? - #18 by mfh)

Thank you!

7 Likes

Thanks. This is great! Do you foresee a SMILES generator from A mol file in the future?

1 Like

Know of good documentation for Mol file formats? :slight_smile:
If so I wouldn’t mind filling that need if its not in ChemFiles :slight_smile:

1 Like

As a start: here

1 Like

Looks doable :slight_smile: But the alternative might be just to get it into a common graph format, then work on bridging that format to others. I’m not a serious I/O guy but I’ve had to write readers/writers a bit.

The SELFIES paper might be of interest for the organization, it comes with code :). As far as I understand, the trade-off is that the SELFIES representation is always a valid molecule but is less human readable, which does not matter much for the use case of exploring the molecule space programmatically with generation models. Not my area though, I just attended a nice talk by that research group.

4 Likes

Interesting, what does Apache 2.0 mean as far as noncommercial recreations go? Could I MIT a license of a port of it? I have a feeling I wouldn’t be able to port the code directly given Python & Julia’s string differences, but whatever came out at least could be tested against vetted code.

Oh wow… 2k+ lines of python if then else statements basically.

https://github.com/aspuru-guzik-group/selfies/blob/master/build/lib/selfies/selfies_fcts.py