I want to do this because Julia is awesome and I depend on it almost exclusively from everything from signal processing to basic stats. It works better than just about anything else. However, I believe that Julia could be made so much more awesome if we had a quick, easy, and julian ANOVA test available. What do you think?

I will even go so far as to stick my neck out and say that I am willing to get my hands “dirty” coding if necessary to make this ANOVA dream a reality.

I recently had ANOVA implemented based on Distribution using Student and Fisher Testing to create a multiple regression. I’m still to write the Readme.md but the code can be found there if you want to use it https://github.com/QuelqunQui/TIFFtoRegression.jl
The ANOVA part is in the StatRegs function

I would respectfully request anyone wanting to implement ANOVA to first read Bill Venables’ (of Venables and Ripley, Modern Applied Statistics with S fame) famous unpublished paper Exegeses on Linear Models.

I am probably too hard-line about always defining anova as a comparison of the fits of two nested (in the sense that one is a special case of the other) linear models but that is because I have spent so much of my life explaining why p-values from certain anova tables don’t make sense.

@dmbates I already skimmed it, and read your comments on https://github.com/JuliaStats/GLM.jl/pull/65. Then I went to Wikipedia and read up on the F-test, and went back to the derivation of ANOVA to see how it is a special case of this test.

After revising my world view on ANOVAs, I’m not bringing ANOVA per se to GLM.jl. After tossing some ideas around over at github, I’ve been working on a function ftest(mod1::LinPredModel, mod2::LinPredModel). I plan on putting a clear explanation in the docs for how to use this test to do an ANOVA.

The work on my PR for this is ongoing. If anyone (say, @dmbates) want to go over to https://github.com/JuliaStats/GLM.jl/pull/182 and discuss how many things I might, could, or should have done differently, it will only make the final end product better.

@dmbates Can you give me an example of a p-value from an ANOVA table that doesn’t make sense? I think I have an idea, but I want to check my understanding.

Thank you for mentioning this. Back at the time of that thread, I looked at the two pull requests and tried to figure out how to implement a basic ANOVA table a la R’s anova, with the intent of implementing aov next. I started off pull requests https://github.com/JuliaStats/GLM.jl/pull/70 and https://github.com/JuliaStats/GLM.jl/pull/65, and the result thus far is here

Disclaimer: it’s buggy and experimental, and still can’t handle categorical variables, but it was a good start, I guess! I was planning to keep on improving it and to add support for categorical variables, but I got swamped with other stuff to do at work. Maybe this is a good time to revisit it.

After closely inspecting the GLM.jl code at that time, I came to the conclusion that ANOVA-related code should live in its own package, with direct dependence on GLM.jl. Basically, all the heavy lifting is/will be already done inside GLM.jl, with the ANOVA package merely adding a convenient layer to fit the different GLMs and present the ANOVA results.

It’s never a bad time to revisit a languishing Julia package (said the busy guy who had five or six languishing Julia packages already)

I like the looks of your ANOVA package. Perhaps a better approach than the one I took; Mine was to go charging into GLM.jl like a bull (with a pull request) in a china shop. I don’t think I broke too much china, but it’s only day 1 on the pull request. so we’ll see.

Agreed! I have retouched the package. It’s now up to date with GLM v0.7.0, with proper handling of categorical variables. Check the new iris data example for one-way ANOVA.

Next, I need to add support for multiple (nested) models, and for aov. Pull requests welcome!

To me, I have always seen ANOVA as more of an Experimental Design tool.

What would be nice to see is adding a really good Experimental Design Package. Such a package would include ANOVA and maybe support for setting up randomized blocking, latin squares, multi-factor designs etc.

Please note there is also quite clean ANOVA implementation here: https://github.com/marcpabst/ANOVA.jl. There are 3 types of tables provided and tests for each type. Explanatory variables can be (in fact, should be) wrapped as effects.