That’s a fairly interesting topic.
To give you a little bit of perspective, here’s my starting point
I’m a research engineer in the french academia. I’ve been teaching for roughly 10 years quantitative methods applied for social sciences.
The bulk of work used to be done with spss. Back in the early 2000 i was trained with spss.
Then came stata in 2009/2010, all course materials were transposed to stata.
Then , since roughly 2013, it became increasingly easy to (1) use spss/stata datasets under R (2) produce reproducible research with R.
Yet R is not frequently used at master’s level, because most students are not skilled enough in CS to learn it in 24 or 48 hours courses.
Only Phd and post doc got to learn R.
The move to Julia, in my field and I guess in a lot of other field where stats are needed but people are not interested in CS / technical issues, it needs to be very simple and lean to learn for beginners without any CS background, and it should give efficient shortcuts to implement mainstream analyses.
So here are my specifics
-
Data Management. Every research project involves a LOT of data management, mangling with labels, various coding of the same underlying data, merging tables. The simplest and the fastest it is the better. In this field alone, i would suggest that stata still dominates R (even incluging the tidyverse) and julia lags quite behind, partly because of all the issues about data frames, missing values, dataframesmeta, query.jl, datastreams. All that is way too complex to be brought to our students given the time slot we get to teach. And i think a lot of teachers would feel the same.
-
Reproducibility. It’s now common ground to be able to produce clean reports and reproducible analyses for publications and reviewers. It used to be SAS strong points, now it’s R leading the way, stata and spss lags behind even is statacorps is trying to catch up.
-
Flexibility. Graduate students need to learn quite a lot of different things : descriptive stats, modelling, plotting but also geometric data analysis, network analysis, text mining, mapping.
Given that point, R is taking the lead because you can learn one framework to explore all those fields, while you used to learn (a) a GIS (2) a statistical package (3) specific softwares for specific fields such as gephi and pajek in network analysis. -
Descriptive stats. Descriptive stats, including label management, as an absolute must in every social science project. Yet, as they are not very “interesting”, they are not given a lot of love in most statistical frameworks. Only recently did R get a good boost thanks to the SJ series of packages.
To sum it up, julia hold huge advantages
- the licence (good for academia & teaching AND good for businesses)
- it’s fast
- it’s not weird and akward in the sense that R is
- the sky is the limit
But thoses advantages are restrained by those main problems
- it’s not yet mature (ok that’s nearly there so this won’t remain a problem)
- data management is cryptic given the dataframes thing
- labelled classes ?
- descriptive stats framework ( function v1 v2, options)
- fast shortcuts to common procedures (Daniel Lüdecke, Dr. phil. for example is gold standard in productivity)
If those points are tackled, i think the underlying qualities of the language will shine, so fingers crossed
(should I say, i’m not complaining at all, i use and teach 4 different stats frameworks, huge progresses have been made over the last 10 years so in anyway future looks bright !)