Julia vs (R/STATA/Matlab/SAS/Python) for common data analysis tasks (2022 edition)

From a previous discussion

So I’m starting this thread with the idea that people could come here with some example data analysis problems and Julia users can reproduce something like them here to give people a sense of how easy or difficult it is to accomplish common tasks.

I don’t want this to become “do my research project for me” of course, but if you’ve worked on a project in another language and want to see how you’d accomplish something similar in Julia, propose a project, preferably with links to a github or a blog page or etc with the code from the other language. Let’s use this as a way to get some simple hand-holding type howtos for data analysis in Julia.

What’s in scope:

  1. Reading datasets in various common formats
  2. Munging data from multiple sources into a particular data set structure / calculating statistics
  3. Plotting various common 2D type plots: scatter, line, histogram, density, small-multiples of each, etc
  4. Fitting regression models
  5. Running simulations of some sort based on data (random number generation, dynamic processes etc)
  6. Optimization / decision making based on data.

Not in scope:

  1. Open research problems
  2. Proprietary datasets
  3. Do my homework for me
  4. Reproducing the output of proprietary software/algorithms etc where the method isn’t public
  5. Lots of precise tweaking of the output of graphs to get very precise visual results / exactly reproduce the output of another piece of software.

To make this really doable the problems should be the kind of thing you’d expect a grad student to be able to do in an afternoon or two. Keep it reasonably scoped.


Would be interesting to hear from @JackStrauss what Julia is lacking to replace dynlm() in R as mentioned in the other thread.

I guess I’ll link to this repo, where Phil Price and I started comparing some tasks in R vs Julia. We never really finished this project. In particular, I wrote a fairly trivial looping simulation in Julia and Phil tried to come up with something performant in R and eventually gave up because both it’s not trivial and also he had lots of real world things to do.

I think the Olympics example shows how trivial stuff in R is still trivial in Julia.

The simulation.jl shows how trivial stuff in Julia may well be nearly impossible in R

The COVID example is unfinished I think. But still of interest.


Looks like this package brings many STATA data cleaning commands to Julia:

I wish these things (including regressionformulae.jl) weren’t scattered around the ecosystem…

If you want to use Stata syntax from Julia there’s

I’m hoping Effects.jl and Vcov.jl become supported widely by the modeling packages.

1 Like

I was referring specifically to these types of commands (from my previous discussion w @dlakelan):