Hi! I’m a relatively new user of Julia, and I thought that maybe contributors of DataFrames and Econometrics packages could find my feedback useful.
I’ve been primarily using Julia for DataFrames, which is a package I love. Recently, I’ve started to use it for Econometrics. I was originally using Stata, and wanted to move to an open-source alternative. First, I tried R, and didn’t like it at all. So, given my nice experience with Julia, I decided to give it a try for Econometrics.
Given that my experience so far has not been good, I wanted to share it with you. Don’t take this post as some criticism. On the contrary, I want to stick to Julia in the future, since I love how things are evolving. I think the problem is more related to a matter of expectations. Julia language is young, and it’s completely understandable that its packages are still not mature enough. However, people sometimes are over-selling the current state of Julia in some areas. This can be detrimental–users can feel deceived and never come back. In fact, I think discussions like whether Julia is as fast as C or the comparisons with Python are more damaging than helpful.
Let me be more concrete. First, I explored GLM. It still lacks some basic functionalities. For example, adding robust errors to a simple OLS requires adding CovarianceMatrices, which becomes a problem when you want to integrate the result with TexTables.
So, then I moved to FixedEffectModels. I was convinced by the documentation, which shows benchmarks where the package is faster than R and claims that “Performances are roughly similar to the newer R function feols. The main difference is that FixedEffectModels can also run the demeaning operation on a GPU (with method = :gpu).” It gave me the feeling that it was the same or better than R.
I tried FixedEffectModels and was surprised that, despite being in its 1.7 version, I ran into issues like this. Additionally, it couldn’t handle some intensive-computing regressions. Given the problems I was facing, I tried
feols in R, given the previous comment on the documentation. I was surprised to see that
feols was way faster and could run these intensive-computing regressions, even when it’s in its 0.10.4 version.
I’m pretty sure that the key here is that
feols uses all the machine threads for these computations. However, the reason is unimportant. My point is that the comparisons of Julia against alternatives can be dangerous–you can always choose some problem in which one program excels. In fact, the documentation of
feols from R shows benchmarks to claim that it’s faster than FixedEffectModels!
So, to make this post a positive contribution, let me share with you some thoughts I have. Again, I don’t know if some of these points are feasible (or even valid!), but maybe they are relevant for certain stuff.
- First, let me provide some feedback regarding FixedEffectModels vs
feolsin R. The following is a MWE that could help identify when the difference between the packages becomes more notorious.
using DataFrames, FixedEffectModels, RCall # FAKED DATA nr = 10000 data(year) = DataFrame(firm = repeat(1:10,Int(nr/10)), industry = repeat(1:10,Int(nr/10)), sales = rand(nr), year = repeat([year],nr)) dff = data(2010) ; [append!(dff,data(yr)) for yr in 2011:2020] # REGRESSIONS IN R R""" library(fixest) rdff = $(dff) model1_R = feols(sales ~ factor(firm)*factor(year) | industry, vcov = "hetero", data = rdff) """ # REGRESSIONS IN JULIA model1_julia = reg(dff, @formula(sales ~ firm*year + fe(industry)), contrasts=Dict([:firm,:year] .=> DummyCoding.()), Vcov.robust());
You can play
nr to see that the differences exacerbate. I think this arise because R multithreads the task, since only when I set threads to 1 in both programs is that Julia performs better. I also tried using PooledArrays instead of using contrasts (I don’t know how internally FixedEffectModels treats strings for dummy variables) and didn’t solve it (only if I was using CUDA with
method =:gpu, double_precision = false was that Julia was performing better).
I read that some contributors are worried about some features of Julia that new users don’t like. Since I was one of them, let me tell you that I now appreciate some of these features. For instance, I appreciate that packages are more verbose than R, since it makes them way clearer. Or, at first, I was hating the handle of missing values, but then I realized its importance. It taught me the importance of making explicit choices about how to handle data, rather than a package silently deciding what’s best (e.g., automatically skipping values).
So, I think the best approach here is insisting on why being more verbose, not automatically skipping missing values, etc is important. Over-adapting methods to make packages as R or Stata will be detrimental in the long run. As an economist, and I guess more generally for any non-programmer person, we don’t have a solid background in programming. We end up learning the hard way to avoid some habits. This is even worse when you come from programs like Stata or Matlab. It’s not until you have a bad experience (e.g., an undetected bug) that you understand this.
Sometimes I’m a little bit worried that Julia could run into the same issue as R, where there are 20 thousand packages to do partial parts of the same problem, so you need to learn a little bit of each. Don’t know if it’s possible, but maybe there could be some unification of packages (for example, by Julia officially recommending a specific package for regressions?). In this way, people will focus on improving one package, rather than starting from scratch and proposing their own solution. For example, right now, if you start with GLM and suddenly you want to use a simple OLS with robust errors, you have to add another package or move to FixedEffectModels. So, it’d be ideal if there was only one package doing this (I can imagine that this is really hard/almost impossible to achieve).
Again, I want to emphasize that this wasn’t a criticism directed at any of the contributors of the packages. Rather, my feedback as an user is my way to thank you!
Keep up this good work!