Packages for mass spectrometry?

My bachelor thesis concerns working mass spectrometry data; I’ll need to be able to load the spectra, visualise them, and most importantly, given a spectrum and a peptide fragment I’ll need to decide how likely it is that they match together. Also, I’m not that skilled in MS/MS yet, do there might be some other needs that I don’t foresee now.

Now, I’d very much love to do my work in Julia, but based on what I found on google, the state of mass spectrometry library ecosystem is much better with Python; I decided to ask here as a last resort, in case I’ve missed something (I’d be glad if that were the case).

So, with that said — what is the state of mass spectrometry libraries? Is there anybody doing mass spectrometry in Julia? Do you think my needs could be met with Julia?

1 Like

The only package I’ve built is
https://github.com/timholy/mzXML.jl

It’s still got a REQUIRE file, so you can tell it’s been a while.

In principle Julia would be great for mass spec, I’d love to see more work in this area.

2 Likes

I can’t answer the rest of your questions, but to this one, I would say unambiguously “yes!” But perhaps the implied question was “Do you think my needs could be met with julia without me needing to write the libraries myself,” in which case, maybe not.

The most straightforward path is probably to wrap the existing python libraries using eg PyCall. This way, you can accomplish the things you need to accomplish that require complex libraries right away, but get to use Julia for things like statistical analysis. And then, if you find some piece of the python library doesn’t work the way you want or is too slow, you can break of just that chunk and start re-writing it in Julia.

This way, you can remain productive on your primary task and still get to use Julia for everything else, plus potentially contribute to the ecosystem for the folks that come after you.

6 Likes

Oh yeah, that’s what I meant — I should’ve phrased my question more carefully. The foray into mass spectrometry is very much a one-off thing to me, and as the focus of the thesis isn’t really the implementation of the basic ms/ms algorithms, I would prefer not to have to write them myself.

I’m thinking it might be easier just to write the data prep in Python, save it to some intermediary file and load it into Julia for the analysis. Or do you find the FFI to be so good that you’d write it all in Julia.

1 Like

Some of the mass spectrometry tools you’d want you can find in ChemometricsTools.jl. I’m more into Spectroscopy personally but I could bake in some more MS tools over time.

1 Like

This is a perfectly sensible strategy. One pitfall to look out for is the complexity of your coding environment in terms of reproducibility - it might paradoxically be easier to manage your python dependencies from Conda.jl.

But if you document your python environment and your Julia environment separately and there’s no overlap, it should be ok.

Update: we now have

https://github.com/timholy/MzXML.jl

https://github.com/timholy/MzCore.jl

https://github.com/timholy/MzPlots.jl

All three are registered packages, so you can just pkg> add them.

8 Likes

Awesome stuff TimHoly. Do you handle arrays whose bin spacing is uniform but varies in scale from file-to-file? If so, I’ve noticed a common abstraction/need that goes all over the ecosystem. For example the thread about JuliaTelecom has an implementation of this, and I have like 2-3 versions which (aren’t released) that I’m unhappy with.

Great question. The utilities try to be agnostic about internal storage format. The MzXML package, which has to pick something, uses a vector of scans, and a scan holds a list of mz => intensity pairs so in fact there is no binning.

However, for visualization binning is necessary. Therefore I support a copyto! method for an AxisArray, and use the binning of the array (and range on the axes) to determine how to do the binning. In mzplot, the display is interactive: as you zoom in further, it uses finer bins, so you can go from a very coarse resolution to very fine. All this seems necessary when you are dealing with arrays that effectively have 10^6 rows!

2 Likes

Interesting that it’s stored as pairs - but that makes sense. Yes for visualization it is the norm to downsample resolution :). Processing untargetted MS can be pretty intensive computationally. Reminds me of some stuff I did back in grad school (TOF-SIMS, etc). Fun stuff!

1 Like

Could the linked package read Agilent .MS files?

You’ll have to convert them to mzXML. mzML is supposedly not very different and could probably also be supported, but I’ve never looked at it.

Thanks. I am new to Mass Spec type data, but I am interested in seeing how I can get the Mass Spec data, such as for all peaks in a single data set (unstructured) I imagine. I would like to look at several files at once for comparison, without needing something like OpenChrom or some other chem software. Assuming I convert to the mzXML, would the Julia packages you developed be able to do this? Thanks, any pointers would be great.