Julia stats, data, ML: expanding usability

This has been a fascinating thread and slide deck to read. I’m primarily an R user (applied econometrics and data science) who is also pretty jazzed about a lot of the features that Julia has to offer. So, I’d like to offer some thoughts coming from that background.

  • It’s already been mentioned above, but the documentation across many Julia packages remains really quite poor. There are some important exceptions to this (e.g. DataFrames.jl is excellent), but it includes some key packages in the DS/econometrics stack and is extremely off-putting for new users. I would focus on fixing documentation before addressing any of the more abstract issues (e.g. row vs column orientation). As an aside, documentation in R was also quite poor and esoteric until about five years ago. Stata users would always point that out to me as a reason for not switching, despite other obvious advantages. I personally think some of the tidyverse benefits are oversold — compared to say, data.table — but the tidyverse and RStudio team definitely deserve plaudits for moving the needle forward here for the R ecosystem as a whole.

  • Missing values. I understand the technical barriers and conceptual breakthroughs that were needed to handle missing values in a general purpose framework. I see a lot of Julia devs quite pushy and pleased with themselves about this. But from a user perspective, missing values in Julia were real PITA when I first started experimenting with its DS ecosystem. Code that worked fine in any of the other major DS languages would fail in Julia because of an obscure missing values issue that needed to be handled explicitly. Maybe this has been sorted out since, but it ties in to my previous point about documentation. Missing values are the norm in any real world dataset and yet to find out the necessary fix I had to consult the main Julia manual instead of (a) just having the package handle it for me, or (b) having an explicit example in the package README/docs.

  • R has been able to overcome a fairly fragmented ecosystem and multiple OO paradigms — indeed, arguably actively exploit them — through a few key packages that provide standardization methods across model classes. To highlight two that make a big difference in my everyday workflow: 1) broom provides “tidiers” for extracting consistent model summaries and goodness-of-fit information in data.frame format. 2) sandwich provides variance-covariance matrix methods that make it easy to adjust standard errors for almost any model class (a big deal in econometrics). Packages like these lead to outsize downstream benefits, since e.g. it makes it easy(ier) to create packages for exporting regression tables and coefficient plots regardless of model object (which is what the also excellent modelsummary package does). I had hoped something similar could be done fairly easily in Julia because of multiple dispatch and would love to see it, regardless.

  • Earlier it was remarked that GLM.jl doesn’t offer anything beyond what can be done in equivalent routines in other languages. But for me this is a feature not a bug! Wherever possible, I want exactly the same interface and results as I’ve come to experience in, say, R. I agree, however, that precompiling canned routines (which I thought was done by default in Julia 1.6?) is important to avoid sluggish TTFP/first-time performance. Speaking of which…

  • Personally, my immediate motivation for using Julia in a project is for some bespoke computation (e.g. a structural estimation). If I’m being brutally honest, there’s no gain to be had from switching out my applied econometrics stuff for which the canned routines in R (via C and Fortran) are already at maximum performance and coverage. And… that’s fine. The interoperability between these languages is good enough that it’s no problem for me to switch between them for any one particular task. Taking a step back, I often find myself opening up Julia just to play around. It’s just an incredibly fun and performant language to work in. (Congratulations and thanks to everyone involved!) I’m particularly excited about the ease of GPU integration going forward. That stuff is much easier in Julia than R or Python and I think could be a real source of comparative advantage in the years to come.

20 Likes