What is the status of ANOVA tests in Julia? I looked at various packages under the JuliaStats organization, including HypothesisTests.jl, but I can’t seem to find this functionality. The closest I was able to find was in the README.md of the GLM.jl package, but it was an example from R using anova on a glm, and there was no equivalent example provided in Julia. https://github.com/JuliaStats/GLM.jl/blob/master/README.md
If indeed this functionality is currently lacking, what is the best route to contribute towards adding this functionality in Julia?
I’m one of the JuliaStats organization developers/maintainers. Providing a general (M)AN(C)OVA package to live in the organization has been on my to-do list for quite some time now, I just haven’t gotten around to it yet (it’s not exactly at the top of the list). Currently there’s an unregistered package for ANOVA (https://github.com/JOfTheAncientGermanSpear/SimpleAnova.jl) though I haven’t used it nor do I know much about it.
If you’d like to add this functionality yourself, that’s great! I would suggest starting with a package that lives in your GitHub account. That way you can build up functionality as you need it and register it when you think it’s production-ready.
Alternatively, if you think ANOVA would be a good fit for a package like GLM.jl, you could submit an issue on the repository asking whether the maintainers agree that it would be a good fit. If so, you’re more than welcome to submit a PR! I–and I imagine other maintainers–would be happy to provide any guidance you may need as you work through a PR.
Could you not in practice have all the functionality of an ANOVA by doing an lm with PooledDataArrays`?
EDIT: sorry, hadn’t checked the context of the question
@ibadr what exactly is the functionality you’d like? The example on the GLM.jl page is not what I would tend to think of as an ANOVA (i.e. an analysis of variance) – in R, somewhat confusingly, the function to do traditional analysis of variance is called aov, whereas anova refers to computing anova tables for one or more fitted glm objects. I am sure this functionality could be recreated quite easily from the GLM model object, so could you give an example of the kind of output you’re looking for?
This is worth knowing, thanks. But I suspect I’m not alone in not knowing that an f test is related to ANOVA. I suspect a lot of biologists coming from R (or like me, from labs that use R) know the name of the function they want but not the underlying theory, so when things are named differently, it’s confusing.
To give another example, I wrote an implementation of PCoA, and went so far as to read the primary literature (I’ve never learned linear algebra), but never realized that it’s the same thing as classical MDS, which is how it’s named in the Stats module that implements it.
This is not an argument that we should copy R names - I’m not actually sure I even have a suggested solution to this problem - just wanted to bring it up and see if anyone had any ideas. Maybe once 0.7/1.0 is out people will start working on conversion guides or something
We probably just need better documentation. The docstring for ftest already mentions ANOVA, but since we don’t have an online manual there’s no intuitive way to search for it. Apart from a manual, blog posts and equivalents to CRAN task views explaining how to do this would definitely be useful.
On a related topic, since I was about to start trying to write one of my own - has anyone developed the equivalent of the permanova function in the R package vegan? Permutational analysis of variance - Wikipedia
I know this is a bit of an old topic but I was just looking for the ANOVA in julia and found this thread. I was literally just showing someone how an ANOVA is basically a GLM and couldn’t figure out how to demonstrate it in Julia (on the fly that is).
I don’t think something like GLM should cater to every syntactic whim but I do think a separate package for variations on ANOVA would be a good place to provide that sort of standard interface. I’d love a permanova function and something that incorporates a lot of the multcomp package from R.
Thank you! That looks promising and the source code looks like if I ran into any errors it would be easy enough to fix. I know that StatsModels.jl provides support for contrast matrices but it seems to be very much in line with the builtin functions in R. Is there anyway to use custom contrast matrices?
Aye. The ANOVA package can definitely benefit from a couple PR to improve it.
Which custom contrast matrices you need besides the ones provided (i.e., DummyCoding, EffectsCoding, HelmertCoding, ContrastsCoding, FullDummyCoding)?
I think what might be more helpful than an ANOVA package (for me at least) would be a tutorial or blog walking through how to do ANOVA with GLM.jl. If it’s a trivial amount of code to run an ANOVA with GLM, I’d much prefer to learn how the concepts around which GLM are built can be used to do that sort of analysis than just mindlessly call an anova function.
Does anything like that exist already–practical guides to doing different sorts of analysis using general linear models?
What I’ve done with that is take R tutorials, or data I’ve analyzed in the past and try to run them in Julia, translating the code. There’s a “Non-linear regression with R” book that I’m also “translating” into Julia. I don’t have a blog, so nothing is public, but I think that’s an effective way of doing things. ANOVA is, in general, fairly easy to compute, that’s why it’s so popular, so if you take any document that explains the process step-by-step, I’m sure you can do it yourself. Obviously, I’m also sure people with more experience with statistics in Julia can give you a function in no time.
Would it make sense if someone chose to give the Anova package some love to include it among the JuliaStats packages? It really is a missing element in the stats ecosystem.
To be honest I completely skimmed over ContrastCoding when reading the documentation (sorry!). That being said it still restricts contrasts to a k-1 matrix. There are cases where I’d like to look at
I can only think of 2 reasons why this would actually be an issue
My data is so big I don’t want to waste the time computing the extra contrasts
Post-hoc analysis where I want to look at very specific interactions
I may be wrong, but reading the documentation gave me the impression that contrasts are applied before fitting to the DataFrame. If this is the case then I’d need to either refit my data for every contrast or dig into the model and reweight the coefficients appropriately to achieve the same effect as refitting with a new contrast (which is what I’d do for situation 1).
I admit that my use case may be very unique, as most people probably only need to specify a very small number of contrasts for their model. Furthermore, it’s also very possible (and even probable) that this sort of ability could facilitate fishing for results.
It may even be worth having a package like CategoricalStats that grabs coefficients from other methods to apply the appropriate statistical inference to categorical data.
I don’t follow what you are trying to describe. What’s the actual contrast rule you need? The contrasts are applied to construct the model matrix which can then be passed to a RegressionModel. Depending on what you are trying to perform, there may be a more efficient data structure you could use.
I guess it’s not really a specific contrast rule but a different application of contrasts entirely. In fact, now that I think of it I had to actually write my own code using the core lm.fit function in R when I did this ~4 years ago. My apologies for the digression.
I probably need to do as @alejandromerchan suggested and go through my past analyses, translating them into Julia. I would still like to see some sort of built in support for this in Julia (mostly because I’m lazy, but also because I’m trying to sell some people on Julia that are hard core R users).