ANN: Pseudoseq.jl - Simulating DNA sequencing experiments

Ward9250 · March 11, 2019, 12:49pm

Hi Everyone,

I’d like to announce a package I’ve been working on for my research group here at the Earlham Institute, (the first full julia project… I think I’m getting to them ) that is currently close to release. There are some features left to implement, but it is usable now.

It is called Pseudoseq, and it is a package designed for the simulation of DNA sequencing experiments.
The idea is not to replicate in sillico any specific machine or technology that exists, but instead to represent sequencing in the abstract, as a sampling process. This allows us to gain insight into the assumptions and intricacies of genome assembly algorithms, and test our abstract understanding vs reality. Currently I’m working on the ability to create abitrary chromosomes/genomes with certain features relevant to the genome assembly problem.

johnh · March 11, 2019, 1:28pm

Ben… do NOT let your computers use the Julia language to make synthetic DNA.
Next it’s “I need your clothes, your boots and your motorcycle”

Ward9250 · March 11, 2019, 1:56pm

I promise I won’t connect it to the institute’s “DNA Foundry” synthetic biology lab

kevbonham · March 11, 2019, 3:03pm

Very cool! It looks like this is designed to simulate isolate genomes. How difficult do you think it would be to extend it to simulate metagenomes? That is, many individual genomes that are found at different relative abundances?

Ward9250 · March 11, 2019, 3:11pm

I think that will be possible at some point. I’m currently adding the functionality to not only read in a genome from FASTA, but to make a genome with desired characteristics, and it’s going to be accessible from several levels, from just something like makegenome(args....) to a more fine-grained set of types and methods, where you can build something up chromosome by chromosome or haplotype by haplotype.

Once I have that, how that plugs into the rest of the sequencing - the Molecule Pool type, and so on, it will be clearer how something like metagenomes can be done. One way might be to simulate distinct genomes, and then mix the reads produced into a single sample at certain proportions. Another would be to mix the genomes at the start. I’m not sure which is the most elegant route right now.

kevbonham · March 11, 2019, 8:13pm

I think this makes sense - sort of assume X cells each of Y organisms, then throw them in a blender. Let me know if I can help once you get there.

Ward9250 · May 7, 2019, 3:00pm

A version 0.1.0 of Pseudoseq has been tagged and released to the BioJulia registry, enjoy. I’d love any feedback and use cases people have for this.

Topic		Replies	Views
Interest in RNA-seq specific convenience package based on BioJulia? Biology, Health, and Medicine	16	1395	July 17, 2023
[ANN] Nucleotide_Essentials.jl - Support for some basic first steps in analyzing Illumina sequencing data! Package Announcements package , announcement , biology	3	422	April 15, 2022
ANN - Towards a (Bio)Julia powered Genome Graphs framework Biology, Health, and Medicine announcement , graphs	25	4264	September 24, 2022
[ANN] ProtoSyn.jl v1.1: Molecular manipulation and simulation Package Announcements biology , biochemistry	2	474	July 13, 2022
On the performance and design of BioSequences compared to the Seq language Community	0	404	January 25, 2020

ANN: Pseudoseq.jl - Simulating DNA sequencing experiments

Related topics