What are the popgen and bioinformatics weaknesses

Just wondering if any Julia users still call python/R or just straight don’t use Julia for any area of bioinformatics or population genetics analyses? And why?

3 Likes

Were I am (major national cancer research institute) Julia isn’t a thing yet. Python/R is still the only one (afaik). For my domain, I am slowly exchanging R with Julia, but this doesn’t have any impact on other groups and will not be part of publications (because it’s just not new at all).

2 Likes

Makes total sense. I wouldn’t expect a large organisation, with tools that work to suddenly switch over. Where you have transitioned to Julia have you encountered tasks that were still quicker and/or easier in python/r?

As in easier/faster because they have a great package for a given task in that language, not because you were still learning Julia.

No, Julia is much easier to use. But important: I am not relying on many packages. When I started, it’s a microarray analysis pipeline, back in 2005 I realized that I have to be as much independent as possible from any R packages, because having heavy dependencies would make updating over the years a real pain in the a… So my starting point is/was mainly my own code (mix of R/python/perl/bash), except some basic packages I am still using. Going from there to Julia is/was a great relief.

There are setbacks. For example I am failing to implement a package to read Affymetrix arrays, see GitHub - oheil/AffyCelFiles.jl: read Affymetrix .CEL files, it’s unfinished and will perhaps stay like this, while reading Illumina Methylation arrays works well: GitHub - oheil/IlluminaIdatFiles.jl: Read Illumina idat files
Another artefact on my way is GitHub - oheil/NormalizeQuantiles.jl: NormalizeQuantiles.jl implements quantile normalization, which is more or less mature.

I can only talk about using Julia in bioinformatics where packages aren’t that important. From my experience, it was much more fun and the results are much more satisfying by implementing R packages from scratch in Julia than using those packages directly. Not everything is worth a new Julia package, because just a minimal subset of a R package was needed for my purposes.

For me I would always go for Julia, even for newer tasks and even if a publication is the ultimate goal. Just because I want to know whats happening and I don’t want to be dependent from some R package (or python) with some C library. My nightmare is: presenting some publication somewhere and I have to answer a question with “well, I have used package X, I hope they do it right”. I admit, that never happened, because I know what’s happening in R packages X.

Of course, let’s say machine learning (ML) would be another beast. If it is about doing something new using ML, I wouldn’t start implementing it from scratch. That’s just too big if the goal is an application of it. Just as an example where my approach is limited. It happens that ML is something, where Julia really needs to be considered.

I am talking much about me, because I/we do not know what your motivation for your question is. Perhaps you have something more concrete in mind?

1 Like

Thanks for that. I think we have a similar approach to things in general.

I am a phd student working with experimental microbial communities. In the past I have use phyloseq/dada2 to assign taxonomy from amplicon sequences and colony pcr. Going forward, I want to do some basic popgen stuff (FST including outlier detection, GWAS) as well as mapping genes to traits. I just wondered if people doing these kinds of things feel like they have a full toolkit in Julia or are still dependant on other languages.

I think my approach will be similar to yours in the end, as I also enjoy implementing from scratch, and learning along the way. However, I also like to be able to cite Julia packages in my work!

I’m not in these fields - I’m mostly doing what would be considered “Data Science” rather than Bioinformatics. But the data ecosystem is excellent, especially Makie for plotting is approaching best-in-class (there are still a couple of things that ggplot does better, but more of the things I care about Makie does better).

There are still a couple of things I use RCall for because I haven’t gotten around to implementing them in Julia (and no one else has either), but those things are dwindling rapidly, and RCall is painless enough that this isn’t a major barrier for me.

When I started my PhD three years algo I was considering using R, Python or R.
Finally I used R with data.table instead of Julia because…

  • I find R a little bit easier.
  • Julia lacks of many important libraries, such as multiple imputation, metanalysis and some advanced survival models.
  • I’ve always had crossed incompatibility problems when installing libraries.
    I don’t know why I didn’t use Python, maybe because most of my workmates and my boss use R.