ANOVA Tests in Julia?

ibadr · January 11, 2017, 5:45pm

What is the status of ANOVA tests in Julia? I looked at various packages under the JuliaStats organization, including HypothesisTests.jl, but I can’t seem to find this functionality. The closest I was able to find was in the README.md of the GLM.jl package, but it was an example from R using anova on a glm, and there was no equivalent example provided in Julia.
https://github.com/JuliaStats/GLM.jl/blob/master/README.md

If indeed this functionality is currently lacking, what is the best route to contribute towards adding this functionality in Julia?

ararslan · January 11, 2017, 7:03pm

Hi ibadr,

I’m one of the JuliaStats organization developers/maintainers. Providing a general (M)AN(C)OVA package to live in the organization has been on my to-do list for quite some time now, I just haven’t gotten around to it yet (it’s not exactly at the top of the list). Currently there’s an unregistered package for ANOVA (https://github.com/JOfTheAncientGermanSpear/SimpleAnova.jl) though I haven’t used it nor do I know much about it.

If you’d like to add this functionality yourself, that’s great! I would suggest starting with a package that lives in your GitHub account. That way you can build up functionality as you need it and register it when you think it’s production-ready.

Alternatively, if you think ANOVA would be a good fit for a package like GLM.jl, you could submit an issue on the repository asking whether the maintainers agree that it would be a good fit. If so, you’re more than welcome to submit a PR! I–and I imagine other maintainers–would be happy to provide any guidance you may need as you work through a PR.

Regards,
Alex

mkborregaard · January 18, 2017, 3:56pm

Could you not in practice have all the functionality of an ANOVA by doing an lm with PooledDataArrays`?
EDIT: sorry, hadn’t checked the context of the question

mkborregaard · January 18, 2017, 7:00pm

@ibadr what exactly is the functionality you’d like? The example on the GLM.jl page is not what I would tend to think of as an ANOVA (i.e. an analysis of variance) – in R, somewhat confusingly, the function to do traditional analysis of variance is called aov, whereas anova refers to computing anova tables for one or more fitted glm objects. I am sure this functionality could be recreated quite easily from the GLM model object, so could you give an example of the kind of output you’re looking for?

iwelch · March 3, 2018, 12:08am

hi alex—did you settle on a good anova package in the end for JuliaStats?

nalimilan · March 3, 2018, 1:49pm

FWIW, GLM.jl now provides the ftest function to perform F-test between nested models.

kevbonham · March 12, 2018, 12:22pm

This is worth knowing, thanks. But I suspect I’m not alone in not knowing that an f test is related to ANOVA. I suspect a lot of biologists coming from R (or like me, from labs that use R) know the name of the function they want but not the underlying theory, so when things are named differently, it’s confusing.

To give another example, I wrote an implementation of PCoA, and went so far as to read the primary literature (I’ve never learned linear algebra), but never realized that it’s the same thing as classical MDS, which is how it’s named in the Stats module that implements it.

This is not an argument that we should copy R names - I’m not actually sure I even have a suggested solution to this problem - just wanted to bring it up and see if anyone had any ideas. Maybe once 0.7/1.0 is out people will start working on conversion guides or something

nalimilan · March 12, 2018, 1:07pm

We probably just need better documentation. The docstring for ftest already mentions ANOVA, but since we don’t have an online manual there’s no intuitive way to search for it. Apart from a manual, blog posts and equivalents to CRAN task views explaining how to do this would definitely be useful.

kevbonham · March 12, 2018, 9:29pm

On a related topic, since I was about to start trying to write one of my own - has anyone developed the equivalent of the permanova function in the R package vegan? Permutational analysis of variance - Wikipedia

Zach_Christensen · November 13, 2018, 4:25pm

I know this is a bit of an old topic but I was just looking for the ANOVA in julia and found this thread. I was literally just showing someone how an ANOVA is basically a GLM and couldn’t figure out how to demonstrate it in Julia (on the fly that is).

I don’t think something like GLM should cater to every syntactic whim but I do think a separate package for variations on ANOVA would be a good place to provide that sort of standard interface. I’d love a permanova function and something that incorporates a lot of the multcomp package from R.

Nosferican · November 16, 2018, 1:58pm

There’s ANOVA.jl… but only does GLM and it breaks with allowrankdeficient.

Zach_Christensen · November 16, 2018, 2:27pm

Thank you! That looks promising and the source code looks like if I ran into any errors it would be easy enough to fix. I know that StatsModels.jl provides support for contrast matrices but it seems to be very much in line with the builtin functions in R. Is there anyway to use custom contrast matrices?

Nosferican · November 16, 2018, 2:35pm

Aye. The ANOVA package can definitely benefit from a couple PR to improve it.
Which custom contrast matrices you need besides the ones provided (i.e., DummyCoding, EffectsCoding, HelmertCoding, ContrastsCoding, FullDummyCoding)?

non-Jedi · November 16, 2018, 4:50pm

I think what might be more helpful than an ANOVA package (for me at least) would be a tutorial or blog walking through how to do ANOVA with GLM.jl. If it’s a trivial amount of code to run an ANOVA with GLM, I’d much prefer to learn how the concepts around which GLM are built can be used to do that sort of analysis than just mindlessly call an anova function.

Does anything like that exist already–practical guides to doing different sorts of analysis using general linear models?

alejandromerchan · November 16, 2018, 5:10pm

What I’ve done with that is take R tutorials, or data I’ve analyzed in the past and try to run them in Julia, translating the code. There’s a “Non-linear regression with R” book that I’m also “translating” into Julia. I don’t have a blog, so nothing is public, but I think that’s an effective way of doing things. ANOVA is, in general, fairly easy to compute, that’s why it’s so popular, so if you take any document that explains the process step-by-step, I’m sure you can do it yourself. Obviously, I’m also sure people with more experience with statistics in Julia can give you a function in no time.

mkborregaard · November 17, 2018, 10:39am

Would it make sense if someone chose to give the Anova package some love to include it among the JuliaStats packages? It really is a missing element in the stats ecosystem.

Zach_Christensen · November 17, 2018, 7:12pm

To be honest I completely skimmed over ContrastCoding when reading the documentation (sorry!). That being said it still restricts contrasts to a k-1 matrix. There are cases where I’d like to look at

-1.0
-1.0
-1.0
3.0

instead of the HelmertCoding

 -1.0  -1.0  -1.0
  1.0  -1.0  -1.0
  0.0   2.0  -1.0
  0.0   0.0   3.0

I can only think of 2 reasons why this would actually be an issue

My data is so big I don’t want to waste the time computing the extra contrasts
Post-hoc analysis where I want to look at very specific interactions

I may be wrong, but reading the documentation gave me the impression that contrasts are applied before fitting to the DataFrame. If this is the case then I’d need to either refit my data for every contrast or dig into the model and reweight the coefficients appropriately to achieve the same effect as refitting with a new contrast (which is what I’d do for situation 1).

I admit that my use case may be very unique, as most people probably only need to specify a very small number of contrasts for their model. Furthermore, it’s also very possible (and even probable) that this sort of ability could facilitate fishing for results.

It may even be worth having a package like CategoricalStats that grabs coefficients from other methods to apply the appropriate statistical inference to categorical data.

Nosferican · November 17, 2018, 7:29pm

I don’t follow what you are trying to describe. What’s the actual contrast rule you need? The contrasts are applied to construct the model matrix which can then be passed to a RegressionModel. Depending on what you are trying to perform, there may be a more efficient data structure you could use.

Zach_Christensen · November 17, 2018, 9:38pm

I guess it’s not really a specific contrast rule but a different application of contrasts entirely. In fact, now that I think of it I had to actually write my own code using the core lm.fit function in R when I did this ~4 years ago. My apologies for the digression.

I probably need to do as @alejandromerchan suggested and go through my past analyses, translating them into Julia. I would still like to see some sort of built in support for this in Julia (mostly because I’m lazy, but also because I’m trying to sell some people on Julia that are hard core R users).

Nosferican · November 17, 2018, 10:19pm

If you share the general idea with general R code I could give you some comments on how to proceed to translating it to Julia.

Topic		Replies	Views
Poll: Do we Julians want ANOVAs? Statistics	14	3628	December 18, 2018
Didn't get consistent result of ANOVA from R and Julia General Usage question , package	2	546	July 4, 2019
Should we mimic R (, MATLAB, ...) functions? General Usage	20	2527	June 28, 2018
ANOVA and TukeyHSD Statistics	4	847	November 4, 2022
Repeated measures ANOVA Statistics	11	1096	May 8, 2021

ANOVA Tests in Julia?

Related topics