Question about translating R code to Julia

I am working on improving a Bioconductor package for analyzing high-throughput genetic assays by porting it from R to Julia. Currently it works well on statistical benchmarks, but one of the main limitations is speed, which I suspect has limited its adoption.

I’m also looking for advice on:

  • Best practices for translating statistical packages from R to Julia.
  • Any gotchas or common pitfalls when porting code like this.

Another question I have is, since this R package is currently part of the BioConductor ecosystem, I don’t really want to purely convert it to Julia. The way I see it, to get the widest adoption by people the bioinformaticians who would use it, I’d want to keep it in the BioConductor ecosystem. The way I would see this working is that:

A researcher has a workflow in R → This results in some R data structure that can pass as input to a function in the package → The user calls an R function from the package → Some interface package like JuliaCall would convert the R data structure behind the scenes into a Julia data structure → The function would then execute behind the scenes in Julia → The output would get converted into R → The researcher continues their work in R.

The part I’m unsure of if it would work is this: The function would then execute behind the scenes in Julia. Can a Julia package, alongside all of it’s dependencies, be bundled easily and distributed in an R or BioConductor package? Can anyone give me tips on how to accomplish this?

My short suggestions:

  • For an example of an R package which calls Julia, you can check out GitHub - SciML/diffeqr: Solving differential equations in R using DifferentialEquations.jl and the SciML Scientific Machine Learning ecosystem to see how it works.
  • The apparent similarity of R and Julia code is sometimes nice for quick translation, but sometimes decieving. Even with use of S3 dispatch in R, the designs tend to be quite different due to the limitations of S3 than something more “Julian”. You may want to step back and analyzing the R code as an abstract algorithm first, then worry about how to translate to Julia. Or you may want to just do a simple translation first and then figure out how the algorithm may translate into more natural Julia code. I jsut want to flag this as something you should be aware of.
3 Likes

At first I read that as Julia too slow but you mean R. Julia would fix that. And you would fix that with:

Note not to be confused with JuliaCall for Python: Guide · PythonCall & JuliaCall

Since I haven’t used R, I’m not completely sure about the dependency management, I think it just works, and other answer saying it’s possible. At least it could be in theory, as with from Python:

Julia would be as fast as C++ or faster… if done right. Should at leas not be slower than calling to C++ package, assuming e.g. “type stable”.

It’s a solved problem to call C++ from R, and take care of dependencies; and to call Python. I’m not suggesting really you need to call Python, and from it to Julia, but that would be one option if it has better dependency management…

There is a minimal overhead for calling, very minimal, but more if you call through Python, then the code would need to do more work at a time. Are you thinking a function call works on just one (small) struct? O(1) not O(n)?

I’m not sure if JuliaCall for R has as good a dependency management, or a compatible one. For Python GitHub - JuliaPy/pyjuliapkg: Manage your Julia dependencies from Python is used. Most likely you could use that, even if calling from R, and NOT have to call through Python, just a bit of Python to set up dependencies(?).

I would really like to know your experience on this (and other direction). It should be as easy to call Julia from R as from Python, i.e. very easy, including dependency management. And to call R from Julia as from Python, there again very easy; and you might know more than me about dependencies management then of R packages.

1 Like

How certain are we that Bioconductor’s speed limitation is a language limitation, let alone solved by Julia? If Bioconductor already AOT-compiles other languages for performance-critical subroutines e.g. Rcpp, then I wouldn’t expect a performance improvement given the same algorithms. That could even occur indirectly ie Bioconductor’s other R dependencies.

Before you dive right into the interop porting pool, dip a toe in by identifying a relatively small, performance-critical, and frequently used function call in Bioconductor, porting it and its input types entirely to Julia (no interop), and compare those with the respective benchmarking libraries. If you can, might as well try C++ as well. Speaking generally, C++ is far more efficient to embed than the full runtimes of R and Julia; don’t remember what the R interpreter takes up after some base calls, but Julia anecdotally is on the order of 0.1-1GB of RAM after some base calls. diffeqr is exceptional because it relies on JIT compilation of highly polymorphic routines and user-provided input functions, which can’t be practically compiled ahead of time.

Opposite direction of you want (R’s JuliaCall linked earlier), but just to be comprehensive, RCall.jl deals with running R from Julia.