Current state of PopGen.jl

pdimens · June 17, 2020, 5:40pm

Hello all,

For the past (I honestly don’t know how many) months, I’ve been writing and rewriting my fork of PopGen.jl (found here). The vast majority of the work involved was establishing a data structure and a million internal functions to facilitate common things that would need to be done for most kinds of analyses (e.g. allele frequency calculations).

Working on this package started as a passion-project to improve my Julia chops along with my understanding of the mathematical principals of population genetics, which I admit I’m not great at and have recruited Jason Selwyn to help with (great at math, not great at Julia). I just recently finished porting basic.stats. from the hierfstat package in R, and have been working on writing a series of helper functions to perform permutation tests for these F-statistics.

I’d hope to one day merge my fork of PopGen.jl into the BioJulia one, but I’d say it’s much too early for something like that. For the basic benchmarks that I’ve ran with what there currently is, PopGen.jl blows hierfstat and adegenet out of the water (see the docs), which is kind of reassuring, although it’s the basics. It would be great to have some extra eyes to look at what’s been done so far and suggest changes/improvements, especially if it pertains to anything fundamental like the PopData type itself or how genotypes are encoded. The main and dev branches are dated, b/c fstat is the one that I’m actively working on. My process is to make a branch for a particular task, when it’s done merge to dev, repeat, then make sure everything works in dev before merging with main.

Thanks!

Pavel

kevbonham · June 17, 2020, 7:09pm

Rad! Might also be of interest to EcoJulia folks. I’m not sure this is still happening, and if it does it will likely be virtual, but you would likely be welcome to join in this meetup.

Re: the permutation / fstats, is that the same thing as PERMANOVA? I would really love to stop relying on RCall and vegan, but haven’t found the time to understand how that works well enough to port it.

pdimens · June 17, 2020, 8:05pm

That meet up looks awesome, thanks!

Regarding the permutations stuff, I’m not super clear on the math yet (why I rely on Jason!), but for now it’s a matter of calculating P-vals and confidence intervals for F statistics. Our ultimate goal with the permutations is to get a functional AMOVA. That being said, if you check out the Permutations.jl file in src I’ve written some reasonably-performant (I think) permutation methods for long-format data frames.

tomkXY · July 14, 2020, 12:16pm

I’m willing to help on this if this project is still active and you need help with anything.

pdimens · July 14, 2020, 12:28pm

Oh yay! Thanks for your interest and offer! This project is 100% active and we can chat about details on the PopGen.jl slack

pdimens · July 14, 2020, 12:34pm

Update:
I’m working to get the package into the general registry and have all the fancy bits other packages seem to like CompatHelper and CI on the main and dev branches. Tests would be nice, but they aren’t a high priority atm. It’s nice to at least have a build test on dev bc I break stuff all the time (intentionally or not) and notices are nice.

Also, I’d like to invite anyone that’s interested to join the PopGen.jl slack group! Slack

kevbonham · July 14, 2020, 3:23pm

Please no! I can’t take another slack workspace… Can’t you just make a channel on the julia slack (or better yet, zulip )

pdimens · July 14, 2020, 3:31pm

@kevbonham but it has my github integrations

Also, there’s so much activity on the Julia slack vs the activity at PopGen that our content will be cleared ~weekly. It’s been reeeeally hard for me to adopt tulip, probably bc the workspace is so big.

kevbonham · July 14, 2020, 3:35pm

Fair enough

tomkXY · July 15, 2020, 12:08am

Please no! I can’t take another slack workspace.

I have more than enough of these as well but I understand a dedicated one will be better. Feel free to poke me if I don’t respond in a while.

tomkXY · July 15, 2020, 12:49am

By the way, the link to the repo is broken: https://github.com/pdimens/PopGen.jl/

https://github.com/pdimens/PopGen.jl/projects

pdimens · July 15, 2020, 12:58am

Oh oops. What a rookie mistake! Thanks for pointing that out.

SergeantMike67 · September 21, 2020, 3:29pm

I am also interested in assisting but am completely unsure of how I can help. I am a newbie to Julia and am trying to figure out the quirks and syntax.

pdimens · September 21, 2020, 3:46pm

@SergeantMike67 that’s great to hear! The one way to get started with Julia is to pick a basic thing to try to do and read the official docs to figure it out (my method). Another would be to have a read of Ben Lauwens excellent Think Julia book.
Once you get a bit more acquainted with the basics of navigating the language you can have a look at the PopGen docs and try to go through the basic tutorial of loading in example data and manipulating/viewing PopData objects. The cool thing is that the package is now registered in General, so you can now install it with ]add PopGen (all that info will be updated with the next release).

Once you get that far and are still interested, you can join the Slack group and chat with us directly and we can figure out how your strengths can add to the package

Hopefully by the end of October Jason and I will have a big new release ready for everyone to play around with. It’s a bit of a spoiler, but the next release (kinship branch) will tidy up some methods, speed up some others, parallelize a bunch of things, introduce a very competitive suite of relatedness measures and come bundled with a new release of PopGenSims.jl that will provide some awesome sibship simulations.

SergeantMike67 · September 21, 2020, 4:20pm

My dream is to come up with a RAD-seq package to use on my data. I am getting my feet wet with a Needleman_Wunch algorithm to get the syntax down. I am not a virgin programmer having a lot of experience using VBA (mostly) and developing R routines so I am not coming at this with “Object Oriented? How do I get the plate into the computer and why do I have to orient the computer towards the plate?” stage. I’m more struggling with the Julia specific jargon.

pdimens · September 21, 2020, 4:42pm

Ah, that’s great. In that case Ben Lauwens book is a great read for better familiarity with Julia-specific things. The gulfsharks data provided in PopGen.jl is radseq SNP data, so the work Jason and I have been doing have been more or less intended for microsat/SNP things (more towards the latter).

SergeantMike67 · September 21, 2020, 7:25pm

I bought Ivo Balbaert and Adrian Salceanu’s book Julia 1.0 Programming Complete Reference Guide but thought about the Think Julia book but being a Grad Student I didn’t have a ton of cash to drop on books. I may still pick up Ben’s book.

pdimens · September 21, 2020, 7:38pm

The book is also available for free using the link I provided above

SergeantMike67 · September 23, 2020, 12:41am

So I have joined the slack group. I have been going through the source code and get about 80% of it. Still getting hung up on some syntax. Now what I think I need is a tutorial on GitHub and all the little pockets that are there.

Topic		Replies	Views
[ANN] PopGen.jl v0.4.0 and PopGenSims.jl v0.0.2 Biology, Health, and Medicine package , announcement	4	641	November 17, 2020
What are the popgen and bioinformatics weaknesses Biology, Health, and Medicine	7	1107	June 7, 2021
Julia stats, data, ML: expanding usability Statistics statistics	84	5079	October 14, 2021
ANOVA Tests in Julia? Statistics	76	14012	August 11, 2022
Pushing Julia/statistics development Statistics	14	6122	August 8, 2022

Current state of PopGen.jl

Related topics