Modify FASTQ file description with a new string using FASTX

Hi, I am a python programmer and new to Julia. One of the tasks I am working on needs modification of description lines in Fastq sequence files.
Doing this in python was pretty time-consuming. So wanted to try the same in Julia with the FASTX package.
Can anyone recommend a method or provide an example to loop through each description and substitute with a modified line and writing it back to Fastq in an efficient way?

Here is the pseudocode of my python script

for every_line in fastq:
if line startswith('@) # description line

Welcome! This should be pretty straightforward, and may be faster than with python, especially if you don’t actually have to store/process the sequences themselves. Can you share what you’ve tried and what issues you’re running into?

It may also be helpful to take a look a look at this post which has some tips on how to format stuff if you’re including code.

Did you read the documentation?
https://biojulia.net/FASTX.jl/stable/manual/fastq/

What is it you do not understand?

You should check the documentation.

Nonetheless, I couldn’t resist giving an implementation a go. Here’s a simple implementation:

import FASTX.FASTQ: identifier, description, sequence, quality, Reader, Writer, Record

function modify_descriptions(f, inp::IO, out::IO)
    reader, writer = Reader(inp), Writer(out)
    for rec in reader
        write(writer, Record(identifier(rec), f(rec), sequence(String, rec), quality(rec)))
    end
end

You can use it like so:

f(x) = description(x) * "_with_extra_stuff"
inp = open("/my/input.fastq")
out = open("/tmp/test", "w")
modify_descriptions(f, inp, out)
close(inp)
close(out)

It’s not optimized. It does around 65 MB/s on my computer - good enough for most use cases? A fast version would need to have the following changes:

  • Iterate over the FASTQ reader by overwriting a single FASTQ record until end of file
  • Modify the description of the record in-place, ideally without any heap allocations
  • Use a fork of FASTX with commit 18a160b merged

With these changes, it would probably be > 500 MB/s uncompressed, or at whatever speed your computer can gzip compress with.

4 Likes