You should check the documentation.
Nonetheless, I couldn’t resist giving an implementation a go. Here’s a simple implementation:
import FASTX.FASTQ: identifier, description, sequence, quality, Reader, Writer, Record
function modify_descriptions(f, inp::IO, out::IO)
reader, writer = Reader(inp), Writer(out)
for rec in reader
write(writer, Record(identifier(rec), f(rec), sequence(String, rec), quality(rec)))
You can use it like so:
f(x) = description(x) * "_with_extra_stuff"
inp = open("/my/input.fastq")
out = open("/tmp/test", "w")
modify_descriptions(f, inp, out)
It’s not optimized. It does around 65 MB/s on my computer - good enough for most use cases? A fast version would need to have the following changes:
- Iterate over the FASTQ reader by overwriting a single FASTQ record until end of file
- Modify the description of the record in-place, ideally without any heap allocations
- Use a fork of FASTX with commit 18a160b merged
With these changes, it would probably be > 500 MB/s uncompressed, or at whatever speed your computer can gzip compress with.