Hi, I am a python programmer and new to Julia. One of the tasks I am working on needs modification of description lines in Fastq sequence files.
Doing this in python was pretty time-consuming. So wanted to try the same in Julia with the FASTX package.
Can anyone recommend a method or provide an example to loop through each description and substitute with a modified line and writing it back to Fastq in an efficient way?
Here is the pseudocode of my python script
for every_line in fastq:
if line startswith('@) # description line
Welcome! This should be pretty straightforward, and may be faster than with python, especially if you don’t actually have to store/process the sequences themselves. Can you share what you’ve tried and what issues you’re running into?
It may also be helpful to take a look a look at this post which has some tips on how to format stuff if you’re including code.
Did you read the documentation?
https://biojulia.net/FASTX.jl/stable/manual/fastq/
What is it you do not understand?
You should check the documentation.
Nonetheless, I couldn’t resist giving an implementation a go. Here’s a simple implementation:
import FASTX.FASTQ: identifier, description, sequence, quality, Reader, Writer, Record
function modify_descriptions(f, inp::IO, out::IO)
reader, writer = Reader(inp), Writer(out)
for rec in reader
write(writer, Record(identifier(rec), f(rec), sequence(String, rec), quality(rec)))
end
end
You can use it like so:
f(x) = description(x) * "_with_extra_stuff"
inp = open("/my/input.fastq")
out = open("/tmp/test", "w")
modify_descriptions(f, inp, out)
close(inp)
close(out)
It’s not optimized. It does around 65 MB/s on my computer - good enough for most use cases? A fast version would need to have the following changes:
- Iterate over the FASTQ reader by overwriting a single FASTQ record until end of file
- Modify the description of the record in-place, ideally without any heap allocations
- Use a fork of FASTX with commit 18a160b merged
With these changes, it would probably be > 500 MB/s uncompressed, or at whatever speed your computer can gzip compress with.
4 Likes