ANOVA Tests in Julia?


#1

What is the status of ANOVA tests in Julia? I looked at various packages under the JuliaStats organization, including HypothesisTests.jl, but I can’t seem to find this functionality. The closest I was able to find was in the README.md of the GLM.jl package, but it was an example from R using anova on a glm, and there was no equivalent example provided in Julia.
https://github.com/JuliaStats/GLM.jl/blob/master/README.md

If indeed this functionality is currently lacking, what is the best route to contribute towards adding this functionality in Julia?


Poll: Do we Julians want ANOVAs?
#2

Hi ibadr,

I’m one of the JuliaStats organization developers/maintainers. Providing a general (M)AN©OVA package to live in the organization has been on my to-do list for quite some time now, I just haven’t gotten around to it yet (it’s not exactly at the top of the list). Currently there’s an unregistered package for ANOVA (https://github.com/JOfTheAncientGermanSpear/SimpleAnova.jl) though I haven’t used it nor do I know much about it.

If you’d like to add this functionality yourself, that’s great! I would suggest starting with a package that lives in your GitHub account. That way you can build up functionality as you need it and register it when you think it’s production-ready.

Alternatively, if you think ANOVA would be a good fit for a package like GLM.jl, you could submit an issue on the repository asking whether the maintainers agree that it would be a good fit. If so, you’re more than welcome to submit a PR! I–and I imagine other maintainers–would be happy to provide any guidance you may need as you work through a PR.

Regards,
Alex


#3

Could you not in practice have all the functionality of an ANOVA by doing an lm with PooledDataArrays`?
EDIT: sorry, hadn’t checked the context of the question


#4

@ibadr what exactly is the functionality you’d like? The example on the GLM.jl page is not what I would tend to think of as an ANOVA (i.e. an analysis of variance) – in R, somewhat confusingly, the function to do traditional analysis of variance is called aov, whereas anova refers to computing anova tables for one or more fitted glm objects. I am sure this functionality could be recreated quite easily from the GLM model object, so could you give an example of the kind of output you’re looking for?


Julia losing popularity among Data Science users (KDnuggets Software Poll)
#5

hi alex—did you settle on a good anova package in the end for JuliaStats?


#6

FWIW, GLM.jl now provides the ftest function to perform F-test between nested models.


#7

This is worth knowing, thanks. But I suspect I’m not alone in not knowing that an f test is related to ANOVA. I suspect a lot of biologists coming from R (or like me, from labs that use R) know the name of the function they want but not the underlying theory, so when things are named differently, it’s confusing.

To give another example, I wrote an implementation of PCoA, and went so far as to read the primary literature (I’ve never learned linear algebra), but never realized that it’s the same thing as classical MDS, which is how it’s named in the Stats module that implements it.

This is not an argument that we should copy R names - I’m not actually sure I even have a suggested solution to this problem - just wanted to bring it up and see if anyone had any ideas. Maybe once 0.7/1.0 is out people will start working on conversion guides or something :slight_smile:


#8

We probably just need better documentation. The docstring for ftest already mentions ANOVA, but since we don’t have an online manual there’s no intuitive way to search for it. Apart from a manual, blog posts and equivalents to CRAN task views explaining how to do this would definitely be useful.


#9

On a related topic, since I was about to start trying to write one of my own - has anyone developed the equivalent of the permanova function in the R package vegan? https://en.wikipedia.org/wiki/Permutational_analysis_of_variance