I’ve being using R (and Rstudio) for one year and now want to move to Julia (maybe with Jupyter or Beaker) because I need better speed. R was causing me a lot of problems when dealing with large dataset.
What R features will I miss in Julia?
In R I use quite a lot data.table, ggplot, lattice, lme4, stan, knitr to create Rmd files, regular expressions and Rcpp.
I would say that in the current state of Julia, you won’t miss Stan since Stan.jl does a great job.
You won’t miss knitr because Weave.jl is great.
You won’t miss regular expressions because Julia’s string macros are more powerful than anything you’d imagine. You definitely won’t miss Rcpp because that entire idea is eliminated: you just write Julia code and and get the C++ speed free. However, if you were using it to wrap a C++ package, you can use Cxx.jl:
And MixedModels.jl is written by the same guy who wrote the main R statistical packages:
As for the others, some who have come to Julia don’t think there’s quite a replacement to ggplot yet. Gadfly.jl uses a grammar of graphics syntax, but it’s limited. Plots.jl is a great plotting library, but it doesn’t use grammar of graphics except in some experimental addon:
Also, DataTables/DataFrames are in flux right now:
Hope that’s a good overview of what to expect at this current stage. Julia, the core language itself, is much more developed than you’d probably expect, and the package ecosystem has many packages which are already more expansive than what you’d find anywhere else (especially in scientific computing, like optimization, differential equations (), and linear algebra). However, I think those two points (and maybe a few others) are where we are still lacking some.
However, the point I’d like to end on is that, in Julia, I find it so easy to write efficient code that in many cases packages aren’t necessary in ways they are in other languages.
Lazy evaluation and dynamic environments are key features of R that are missing in Julia. I’m working on a package that will provide similar functionality.
The rich set of mature libraries R has. Julia is getting there, but since R has been around for a long time, you may need to implement functionality (or wait for someone to do it) which is just an install.packages away in R.
The dynamic nature of environments, which makes R a very sophisticated scripting language. This makes debugging and a lot of other things extremely convenient. At the same time, it is one of the language features that prevents (efficient) compilation. So if you are moving to Julia for speed, this is precisely one of the trade-offs.
very mature plotting libraries. Again, the Julia ecosystem is getting there, but that will take some time.
Now, to be fair, some of the advantages:
insane speed, compared to R. Even if you write suboptimal Julia code in fairly bad style, it will be 10-100x faster then R in my experience. Which is to be expected, as R is interpreted.
very rich and expressive type system which at the same type feels very natural. Only seasoned R users use S3 and S4 classes, while in Julia, you will be defining your own composite types and methods from day 1.
a community which is focused on iterating towards “the right solution” instead of some quick hack that gets the job done. You will see major revisions of libraries, which will require that you rewrite your code, but most of the time it will make your code much better.
I would like to find a package able to plot large datasets quickly, I’ve read about GR and GLView, though maybe they are not as easy to use as ggplot or gadfly.
What about working with dataframes with missing data?
I think in Julia I need to use the NullableArrays package.
I tried Julia two years ago and I found it was slow and I uninstalle it.
I think the problem wasn’t Julia but Juno and the fact that it compile every new thing you use.
I hope now it works better. And I’ve found that Jupyter and Beaker work faster and IMO better.
You will definitely not miss these! Two separate versions of objects, each super slow… that was the worst part of developing in R (if you develop large software)
YMMV. I find ggplot and Gadfly unusable since Grammar of Graphics does not seem to match my workflow at all (scientific computing, not data science). On the other hand, you seem to be on the Grammar of Graphics side of things, which would make the transition to “the standard way of plotting” more difficult. Plots.jl is a good middleground, and if it just finished up its alternative GoG API…
Two years ago packages didn’t precompile I think? Now most packages precompile, which speeds up starting times. Still not instant start, but it can make a huge difference. I would give Juno another try though, it’s completely different. Now it’s part of Atom and not LightTable.
GR is a very fast plotting library, and the InspectDR package is also specializing in plotting large datasets quickly. Both can be used on their own or as a backend to Plots.
Base Julia is much faster than R, but…
What about its packages?
Do they have similar speed in both platforms?
For example is Stan as fast in R as it is in Julia?
Is Julia’s MixedModels faster than R’s lme4?
It depends. A (well-made) R package written in R will be much much slower than a (well-made) Julia package. But, most of the biggest R packages are actually written in C/C++, with a small interface to R. In that case, if most of the time is spent inside of the package’s functions, then it’s a performance comparison between C/C++ and Julia, which is pretty much 1x if both codes are well written.
So what I think you’ll find is that for the most popular R packages vs Julia packages, it’s all dependent on the implementation (well, the C/C++ code vs the Julia code). But a lot of the smaller packages on CRAN are simple R scripts some non-software engineer pooped out, and a small script or little Julia package will destroy that in performance (it’s pretty insane what pure R vs pure Julia can be like if the problem doesn’t vectorize extremely well).
tl;dr: it’s highly dependent on the package you’re talking about.
Essentially yes, because this doesn’t run in either language. Stan is essentially its own programming language. The R and Julia packages just allow users to define Stan scripts that run in Stan and return the results to R/Julia respectively. So the internals there are exactly the same.
I don’t know of a benchmark here. Maybe @dmbates is around.
Yes, MixedModels is faster, often much faster, than the lme4 package for R. Many factors are at work here, not just the fact that Julia code will run faster than pure R code. Most of the time in an lme4 fit is spent in compiled code whereas MixedModels does not use any purpose-built C/C++ code (the linear algebra does end up calling OpenBLAS or MKL).
The real advantage of Julia is that I can experiment with the algorithm without sacrificing performance or needing to rewrite C/C++ code and interface code. So MixedModels is faster partly because of Julia, partly because of tools like the optimizers available in Julia and partly because the algorithm is cleaner.
Yes, this is where I think opinions diverge. I would say this: as a language “for developers and expert users”, Julia is definitely the best bar none. The reason is because once you get the hang of Julia, you can just use Julia without needing resources. Julia is developed really cleanly, employs very little/no magic, and the vast majority of Julia Base/packages is written in Julia. I find that the vast majority of the time when writing Julia, I can “guess” (or actually, just know) what the compiler is going to optimize and how it’s going to do it. I just check Base code and package sources to see how everything works instead of checking docs (and send PRs).
This is a style of using a language is something I hadn’t ever experienced before (years of other languages, about 1 year of Julia). In MATLAB/Python/R I had to always use lots of documentation, and search StackOverflow for answers. In Julia it’s usually unnecessary (the only time it comes up really is for actual Julia bugs, and usually I get a Github hit for what it is). Using C was too far in the other direction: isolated and re-inventing not just the wheel but also wood and stones and it was too much time wasted.
So if there’s a language to get really good at, Julia is definitely the right choice. That said, it is still easier “to be a noob” with Python and R since there are more pre-packaged solutions and StackOverflow answers ready for you. But even in Python/R/MATLAB, if you dig past the basics say to S4 objects and investigating what the compiler is auto-optimizing in the background, you quickly enter an area that is beyond what’s documented and answered (some of it may not even be well understood…).
This should be negligible for most problems which are not games (games only because of graphics drivers).