Packages for mass spectrometry?

Eugeleo · July 22, 2020, 10:06am

My bachelor thesis concerns working mass spectrometry data; I’ll need to be able to load the spectra, visualise them, and most importantly, given a spectrum and a peptide fragment I’ll need to decide how likely it is that they match together. Also, I’m not that skilled in MS/MS yet, do there might be some other needs that I don’t foresee now.

Now, I’d very much love to do my work in Julia, but based on what I found on google, the state of mass spectrometry library ecosystem is much better with Python; I decided to ask here as a last resort, in case I’ve missed something (I’d be glad if that were the case).

So, with that said — what is the state of mass spectrometry libraries? Is there anybody doing mass spectrometry in Julia? Do you think my needs could be met with Julia?

tim.holy · July 22, 2020, 12:51pm

The only package I’ve built is
https://github.com/timholy/mzXML.jl

It’s still got a REQUIRE file, so you can tell it’s been a while.

In principle Julia would be great for mass spec, I’d love to see more work in this area.

kevbonham · July 22, 2020, 4:21pm

I can’t answer the rest of your questions, but to this one, I would say unambiguously “yes!” But perhaps the implied question was “Do you think my needs could be met with julia without me needing to write the libraries myself,” in which case, maybe not.

The most straightforward path is probably to wrap the existing python libraries using eg PyCall. This way, you can accomplish the things you need to accomplish that require complex libraries right away, but get to use Julia for things like statistical analysis. And then, if you find some piece of the python library doesn’t work the way you want or is too slow, you can break of just that chunk and start re-writing it in Julia.

This way, you can remain productive on your primary task and still get to use Julia for everything else, plus potentially contribute to the ecosystem for the folks that come after you.

Eugeleo · July 22, 2020, 4:41pm

Oh yeah, that’s what I meant — I should’ve phrased my question more carefully. The foray into mass spectrometry is very much a one-off thing to me, and as the focus of the thesis isn’t really the implementation of the basic ms/ms algorithms, I would prefer not to have to write them myself.

I’m thinking it might be easier just to write the data prep in Python, save it to some intermediary file and load it into Julia for the analysis. Or do you find the FFI to be so good that you’d write it all in Julia.

anon92994695 · July 22, 2020, 4:56pm

Some of the mass spectrometry tools you’d want you can find in ChemometricsTools.jl. I’m more into Spectroscopy personally but I could bake in some more MS tools over time.

kevbonham · July 22, 2020, 8:12pm

This is a perfectly sensible strategy. One pitfall to look out for is the complexity of your coding environment in terms of reproducibility - it might paradoxically be easier to manage your python dependencies from Conda.jl.

But if you document your python environment and your Julia environment separately and there’s no overlap, it should be ok.

tim.holy · November 8, 2020, 4:22pm

Update: we now have

https://github.com/timholy/MzXML.jl

https://github.com/timholy/MzCore.jl

https://github.com/timholy/MzPlots.jl

All three are registered packages, so you can just pkg> add them.

anon92994695 · November 8, 2020, 7:49pm

Awesome stuff TimHoly. Do you handle arrays whose bin spacing is uniform but varies in scale from file-to-file? If so, I’ve noticed a common abstraction/need that goes all over the ecosystem. For example the thread about JuliaTelecom has an implementation of this, and I have like 2-3 versions which (aren’t released) that I’m unhappy with.

tim.holy · November 8, 2020, 7:54pm

Great question. The utilities try to be agnostic about internal storage format. The MzXML package, which has to pick something, uses a vector of scans, and a scan holds a list of mz => intensity pairs so in fact there is no binning.

However, for visualization binning is necessary. Therefore I support a copyto! method for an AxisArray, and use the binning of the array (and range on the axes) to determine how to do the binning. In mzplot, the display is interactive: as you zoom in further, it uses finer bins, so you can go from a very coarse resolution to very fine. All this seems necessary when you are dealing with arrays that effectively have 10^6 rows!

anon92994695 · November 8, 2020, 8:02pm

Interesting that it’s stored as pairs - but that makes sense. Yes for visualization it is the norm to downsample resolution :). Processing untargetted MS can be pretty intensive computationally. Reminds me of some stuff I did back in grad school (TOF-SIMS, etc). Fun stuff!

fieldofnodes · January 30, 2021, 2:30am

Could the linked package read Agilent .MS files?

tim.holy · February 1, 2021, 9:21am

You’ll have to convert them to mzXML. mzML is supposedly not very different and could probably also be supported, but I’ve never looked at it.

fieldofnodes · February 1, 2021, 10:09am

Thanks. I am new to Mass Spec type data, but I am interested in seeing how I can get the Mass Spec data, such as for all peaks in a single data set (unstructured) I imagine. I would like to look at several files at once for comparison, without needing something like OpenChrom or some other chem software. Assuming I convert to the mzXML, would the Julia packages you developed be able to do this? Thanks, any pointers would be great.

Topic		Replies	Views
Julia for Education: Package Selection Community	36	2678	June 21, 2022
Data Science for Managers: Programming Languages Offtopic	11	1542	December 2, 2019
Looking for use cases of Julia in computational and data science research with astrophysics focus General Usage question	10	656	January 15, 2023
State of the Julia ecosystem General Usage question , package	26	3010	October 20, 2017
[Suggestion Requested] New to Python & Data Science. Learn Julia instead? Community question	11	1458	March 20, 2018

Packages for mass spectrometry?

Related topics