Statistics in Julia good enough for my course?

I am starting a Statistics course tomorrow, as an undergrad. While prepping for class, I met the following statement:

The faculty I am in at my school swears to MatLab for most things, and I have followed courses on Control Theory and DSP, which were MatLab-heavy, using Julia. This has not been a technical problem at all.

Now on the administrative level, I have asked permission to use Julia instead of MatLab for the exam, which is multiple choice. But on the technical level, I wanted to make sure that Julia is up for the task, because this time, it seems I will need it for my exam.

I also have a list of the topics we will learn about:

Given that list of topics:
Based on your experience, is statistical tooling developed enough in Julia, so that I can again forget about MatLab, and use Julia for solving problems on all the relevant topics?

Worth mentioning is that I am enthusiastic about Julia, really don’t like MatLab, and am used to having to fiddle around a bit more than my classmates to get things working.

EDIT:
Also, I would appriciate a pointer to the relevant Julia packages. I know about JuliaStats.org, but there are so many, and I don’t know which ones are relevant, nor if it has everything I need.

5 Likes

Should not be an issue, I think using Distributions, GLM, HypothesisTests should have you covered for what’s on this syllabus.

You might also be interested in this workshop from JuliaCon:

with accompanying book here:

https://statisticswithjulia.org/StatisticsWithJuliaDRAFT.pdf

15 Likes

This is very basic frequentist statistics, for which the packages have been mature for years.

You may want to pay attention to calling conventions though, some functions might work differently — eg use the standard deviation instead of the variance, and similar. Better make sure you understand the theory and check the docs. ANOVA may be a tricky point, since they are so many subtly different ways to do it.

FWIW, as much as I like Julia, I would just stick to the language used by the course and use the time I save to do a fun project in Julia. It is unlikely that there will be a lot of programming involved at this level, the computer should just serve as a sophisticated calculator and plotting tool.

32 Likes

The issue I see here is that you said the exam is multiple choice. This means that it will probably be computer marked. So you’d be taking a bit of a risk here with respect to computational differences between answers provided by MatLab vs Julia. For example, eigenvectors are scaled completely differently by the two languages leading to totally different (but equally valid) outcomes. Now, what you’ve posted looks like pretty basic stuff, so I think you should be okay, but I certainly wouldn’t guarantee it…

7 Likes

Sometimes people just expect one specific output format with these statistical tools. At least when I switched to Julia, it was sometimes am annoyance that statistical packages were more “bare-bones”, that means they had fewer convenience methods for checking effect sizes, comparing models, etc, and the printouts were not as detailed or nicely formatted.

Of course, if you really know what you’re doing, that might not matter much to you, as you might know how to use the available parts to build what you need. But as a beginner that can just add unnecessary friction.

1 Like

I do a decent amount of what I would describe as applied statistics at work and I mostly use StatsBase, Distributions, GLM, and HypothesisTests.

As Tamas pointed out, ANOVA may be a bit tricky, depending on what you need to do. There is SimpleANOVA.jl but I’ve never used it myself. I wrote my own package for single factor ANOVA but it’s not in the registry and wasn’t written with the intent that anyone besides me would use it (i.e. I do not recommend you use it for your class). At the time, I was having trouble finding Julia implementations of typical post-hoc tests that you perform if your ANOVA shows differences among means (ended up writing my own Tukey-Kramer implementation, included in that package).

I really don’t like R but I have found the need to “patch” Julia code on a few occasions with calls to R where I needed some statistical function that I couldn’t find in Julia and didn’t have the will to implement myself.

4 Likes

FWIW the syllabus says analysis of data grouped by one or two categorical variables, so it’s probably doable with GLM (but then some fields have very specific requirements on how to do ANOVA and which statistics to report as Tamas hinted at, so the usual statistical tests are just linear models defense might sometimes not cut it).

I also think that MATLAB is an odd choice for this syllabus, when there are so many alternatives much more geared towards this (R, Stata, SPSS, SAS, EViews, LIMDEP, JMulTi…). I can’t remember anyone at the econ department where I did my PhD using MATLAB to do this kind of basic stats/econometrics.

5 Likes

The faculty I am in at my school swears at Matlab :wink:

13 Likes

It’s common of faculty to require one particular language/software, typically based on the instructor’s personal preference and/or the head of the department’s preference. A few years I ago taught a basic stats course where Excel and SPSS were required. Out of personal choice, I offered the code in Excel, in LibreOffice’s Calc, in the open source version of SPSS (PSPP) as well as R and Python. It was still Julia’s early days then: today I would definitely offer Julia code too.

It was a little more work, but I learned a lot. On the rare occasions where I couldn’t put together code for one particular task in one language, I had several other options to show. And there always was some last-minute problem.

So while I was an instructor, not a student, there is one lesson I would like to share with you: If the course pushes you a little bit, you’ll end up with some issue that may take a couple of days to resolve: ugly legend, overlapping axis labels, confidence bands that aren’t smoothed the way you expect, small differences in the decimals, you name it. And you may not be able to replicate exactly. Given time, considering the amazing help you will receive on discourse, you will be able to resolve the issues, but under the stress of examinations and deadlines, are you sure you want that? As Tamas suggested above, you could play by the rules and use MATLAB, while at the same time replicating everything in Julia. You could share your replication files on github/website and let your instructors/faculty know about it. Maybe they’ll like it and eventually switch to Julia too. Thus, instead of being the guy who doesn’t want to play by the rules, as it were, you’ll be the guy who makes the new rules.

8 Likes

I wish they had taught me stats in Julia (instead of STATA/Matlab/R).
It’s so much more intuitive & closer to the math.
I’m confident the Julia ecosystem has most of what you need to teach, & the few things it may not have will be super-easy & quick to code up.

I began writing a blog-post comparing the way random variables are handled in Julia/MATLAB/R/STATA/Mathematica/Python.

The only one that comes close in simplicity is Mathematica…

10 Likes

Thanks for all the replies and input <3

Regarding ANOVA:
I could simply call the MATLAB (urgh, all caps) function through the MATLAB.jl package, right?

Actually that would be ideal.The syllabus covers very standard and relatively straightforward topics, with similar calling conventions both ways. In Julia,

EqualVarianceTTest(x, y)

and in Matlab,

ttest2(x,y)

That way you can convince your instructor that you handled everything correctly. The only tricky thing is ANOVA, where there are different conventions for data input, and you have to read the documentation carefully. Ideally you can submit a nice Jupyter or Pluto notebook, where the last line shows how you get the same result in Matlab.

I also agree with previous replies about treating everything as general linear model. But if your instructor is a traditionalist, they might insist that you do it their way. So you could use a glm in Julia, and compare against the traditional test in Matlab. As others have said, sometimes there are some assumptions built in when someone says ANOVA (usually for more complicated cases than expected here), so explicitly testing against Matlab would help make sure you on the same page with the assumptions. That would protect your grade, and perhaps even convince your instructor to port to Julia and glm’s.

If the course specifies Matlab, then the sensible thing to do is to use Matlab.
Anything else is, frankly, foolish, even if you personally are convinced that Julia is superior.
Proselytising your professor is not a good idea.

1 Like

I really can’t agree with your premise here. I progressed through education to doctoral level, then worked as a researcher/teacher/software developer for several decades.

As a teacher, I attempted to encourage people to find solutions using tools that worked for them. If they took a different approach than I expected, I needed to be convinced that they understood what was going on, which needed additional commitment from the student and from me. Probably good for both of us :wink:

With the caveat to the OP that it will require extra work, and the acknowledgment that the post indicates willingness to do that, I think it is not at all foolish.

I should say that my education and work took place in Australia and Germany which means my opinion is not universal. Things may be different in other cultures.

3 Likes

I teach a course where the students need to solve dynamic models. I have examples using Modelica in lecture notes, but plan to upgrade it to include Modelica + Julia (because these languages allow simple handling of DAE models; + put code in GitHub). I also give them a group task. In the group task, I tell them that I recommend them to use Modelica or Julia, but that they can choose any language they want (e.g., MATLAB or Python, or Excel for that matter). Since I haven’t used MATLAB in 10 years, and I’m not super fluent in Python – I tell them that if they do not choose the languages I recommend, they are on their own wrt. debugging, etc.

For most courses where the programming language itself is not the focus, I’d assume that an open-minded professor would say something similar.

3 Likes

Well, I’d share my little experience regarding this. For my last Data Analysis exam it was required to code and perform task in both Fortran 77 and ROOT (yeeesss, those seventies’ guys).

I was quite afraid to go against such an explicit diktat, especially recalling many previous times I ended up literally pissing professors, for similar “independency moves”, and having a hard time during the oral defense of my work.

But in the end I just said to myself "damn, we are in the 2010s, I should be allowed to use Python and R (sorry, never heard of Julia at that time…) and hence I went for it. It was for sure a lot of additional effort, but also quite an interesting journey and I ended learning a lot more than just following and tweaking the course slides and examples. The professor eventually made a little fun of my essay, especially the introduction, where I was literally haranguing my choice, and referred to some sort of “captatio benevolentiae”… but all in a friendly way, and actually said he was for long time interested in exploring these “new” interpretes languages and that my move actually convinced him to work his way to put some example in the slides for the next iteration of the course.

So, you will

  • indeed work harder using Julia, than whichever language that is preferred by the lecturer and extensively referenced in lectures, tutoring sessions and so on

  • not necessarily encounter hostile feedback and harsh reactions from the professor(s). They might instead appreciate your curiosity and autonomy :slight_smile:

7 Likes