Current state of PopGen.jl

Hello all,

For the past (I honestly don’t know how many) months, I’ve been writing and rewriting my fork of PopGen.jl (found here). The vast majority of the work involved was establishing a data structure and a million internal functions to facilitate common things that would need to be done for most kinds of analyses (e.g. allele frequency calculations).

Working on this package started as a passion-project to improve my Julia chops along with my understanding of the mathematical principals of population genetics, which I admit I’m not great at and have recruited Jason Selwyn to help with (great at math, not great at Julia). I just recently finished porting basic.stats. from the hierfstat package in R, and have been working on writing a series of helper functions to perform permutation tests for these F-statistics.

I’d hope to one day merge my fork of PopGen.jl into the BioJulia one, but I’d say it’s much too early for something like that. For the basic benchmarks that I’ve ran with what there currently is, PopGen.jl blows hierfstat and adegenet out of the water (see the docs), which is kind of reassuring, although it’s the basics. It would be great to have some extra eyes to look at what’s been done so far and suggest changes/improvements, especially if it pertains to anything fundamental like the PopData type itself or how genotypes are encoded. The main and dev branches are dated, b/c fstat is the one that I’m actively working on. My process is to make a branch for a particular task, when it’s done merge to dev, repeat, then make sure everything works in dev before merging with main.

Thanks!

Pavel

3 Likes

Rad! Might also be of interest to EcoJulia folks. I’m not sure this is still happening, and if it does it will likely be virtual, but you would likely be welcome to join in this meetup.

Re: the permutation / fstats, is that the same thing as PERMANOVA? I would really love to stop relying on RCall and vegan, but haven’t found the time to understand how that works well enough to port it.

That meet up looks awesome, thanks!

Regarding the permutations stuff, I’m not super clear on the math yet (why I rely on Jason!), but for now it’s a matter of calculating P-vals and confidence intervals for F statistics. Our ultimate goal with the permutations is to get a functional AMOVA. That being said, if you check out the Permutations.jl file in src I’ve written some reasonably-performant (I think) permutation methods for long-format data frames.

1 Like

I’m willing to help on this if this project is still active and you need help with anything.

Oh yay! Thanks for your interest and offer! This project is 100% active and we can chat about details on the PopGen.jl slack :grin:

Update:
I’m working to get the package into the general registry and have all the fancy bits other packages seem to like CompatHelper and CI on the main and dev branches. Tests would be nice, but they aren’t a high priority atm. It’s nice to at least have a build test on dev bc I break stuff all the time (intentionally or not) and notices are nice.

Also, I’d like to invite anyone that’s interested to join the PopGen.jl slack group! https://join.slack.com/t/popgenjl/shared_invite/zt-deam65n8-DuBs2z1oDtsbBuRplJW~Pg

Please no! I can’t take another slack workspace… Can’t you just make a channel on the julia slack (or better yet, zulip :wink:)

1 Like

@kevbonham but it has my github integrations :frowning:

Also, there’s so much activity on the Julia slack vs the activity at PopGen that our content will be cleared ~weekly. It’s been reeeeally hard for me to adopt tulip, probably bc the workspace is so big.

1 Like

Fair enough :laughing:

Please no! I can’t take another slack workspace.

I have more than enough of these as well but I understand a dedicated one will be better. Feel free to poke me if I don’t respond in a while.

By the way, the link to the repo is broken: https://github.com/pdimens/PopGen.jl/

Oh oops. What a rookie mistake! Thanks for pointing that out.