Julia for processing Next-Generation Sequencing (NGS) datasets

Hi all! I want to inquire how to use Julia for processing Next-Generation Sequencing (NGS) datasets, especially to merge paired-end sequencing reads. I think currently there seems no suitable packages in Julia as pandaseq in python for doing this. Importantly, quality information in the Illumina reads is important for score and evaluate alignment of paired-end sequences1. But it seems no packages in Julia has considered and used this information. Thus I want to know whether there are Julia packages that can handle the assembly problem and also I wanted to know suggestions to use python in Julia as well. THANKS!

1 Like

Welcome, @Bonjour-Lemonde !

That’s right, there is no package in Julia to merge paired-end NGS sequences. Julia currently has a bunch of low-level packages for Bioinformatics, such as FASTX.jl to parse FASTQ files, and BioAlignments.jl to do S/W alignment.

However, basic NGS tasks like read trimming and merging and assembly is usually best done with existing command-line tools which tend to be written in C or C++. For Illumina reads, I would recommend fastp for trimming and merging, and SPAdes for assembly.

Julia is suitable when you need to do a truly custom analysis, e.g. when developing new techniques in the field. For most standard analyses, I would use existing tools.

5 Likes

Agree with this completely. There’s no reason in principle that Julia couldn’t be used to write such tools, but

  1. Given limited resources, no one has considered it worth it to duplicate the effort
  2. Julia is not (yet?) a great choice for developing command line tools, which most biologists expect.

Depending on your application, there are some Julia packages for downstream analysis (eg SingleCellProjections.jl if you’re doing scRNAseq), and lots of stuff in the stats/ML space.

I’m using Julia regularly to do custom read trimming & processing, I think doing something like PANDAseq should be relatively straightforward to implement around existing packages, although that’s maybe more a package developer project than end-user one.

1 Like

Oh neat - any chance you’d be willing to write up a tutorial or cookbook recipe for BioTutorials?

I could, although the difficulty is to find a realistic use case that isn’t too boring (otherwise it’s just this) and public data that goes with it.