Poll: Do we Julians want ANOVAs?


#1

Hi everyone,

I’m a scientist on a crusade. My crusade is to bring a really good ANOVA system to julia, probably by merging https://github.com/JuliaStats/GLM.jl/pull/70.

I want to do this because Julia is awesome and I depend on it almost exclusively from everything from signal processing to basic stats. It works better than just about anything else. However, I believe that Julia could be made so much more awesome if we had a quick, easy, and julian ANOVA test available. What do you think?

  • Yes, I’d love to bring ANOVA to GLM.jl
  • No, I wouldn’t use this feature that much

0 voters

I will even go so far as to stick my neck out and say that I am willing to get my hands “dirty” coding if necessary to make this ANOVA dream a reality.

Lewis


#2

I think ANOVA is useful not only as a test, but a a way of summarizing the fitted model. If you did something like this, it would be very useful.


#3

I recently had ANOVA implemented based on Distribution using Student and Fisher Testing to create a multiple regression. I’m still to write the Readme.md but the code can be found there if you want to use it :wink: https://github.com/QuelqunQui/TIFFtoRegression.jl
The ANOVA part is in the StatRegs function


#4

Great, keep the results coming in.

@QuelqunQui: Thanks for alerting me to your ANOVA implementation. I may well be checking it out while things work out over at at GLM.jl


#5

@LewisHein glad it might help you :slight_smile:


#6

See also ANOVA Tests in Julia?


#7

I would respectfully request anyone wanting to implement ANOVA to first read Bill Venables’ (of Venables and Ripley, Modern Applied Statistics with S fame) famous unpublished paper Exegeses on Linear Models.

I am probably too hard-line about always defining anova as a comparison of the fits of two nested (in the sense that one is a special case of the other) linear models but that is because I have spent so much of my life explaining why p-values from certain anova tables don’t make sense.


#8

@dmbates I already skimmed it, and read your comments on https://github.com/JuliaStats/GLM.jl/pull/65. Then I went to Wikipedia and read up on the F-test, and went back to the derivation of ANOVA to see how it is a special case of this test.

After revising my world view on ANOVAs, I’m not bringing ANOVA per se to GLM.jl. After tossing some ideas around over at github, I’ve been working on a function ftest(mod1::LinPredModel, mod2::LinPredModel). I plan on putting a clear explanation in the docs for how to use this test to do an ANOVA.

The work on my PR for this is ongoing. If anyone (say, @dmbates) want to go over to https://github.com/JuliaStats/GLM.jl/pull/182 and discuss how many things I might, could, or should have done differently, it will only make the final end product better.


#9

@dmbates Can you give me an example of a p-value from an ANOVA table that doesn’t make sense? I think I have an idea, but I want to check my understanding.

Thanks


#10

Thank you for mentioning this. Back at the time of that thread, I looked at the two pull requests and tried to figure out how to implement a basic ANOVA table a la R’s anova, with the intent of implementing aov next. I started off pull requests https://github.com/JuliaStats/GLM.jl/pull/70 and https://github.com/JuliaStats/GLM.jl/pull/65, and the result thus far is here

Disclaimer: it’s buggy and experimental, and still can’t handle categorical variables, but it was a good start, I guess! I was planning to keep on improving it and to add support for categorical variables, but I got swamped with other stuff to do at work. Maybe this is a good time to revisit it.

After closely inspecting the GLM.jl code at that time, I came to the conclusion that ANOVA-related code should live in its own package, with direct dependence on GLM.jl. Basically, all the heavy lifting is/will be already done inside GLM.jl, with the ANOVA package merely adding a convenient layer to fit the different GLMs and present the ANOVA results.


#11

Prof. Bates, thank you for citing this very valuable resource!


#12

It’s never a bad time to revisit a languishing Julia package (said the busy guy who had five or six languishing Julia packages already)

I like the looks of your ANOVA package. Perhaps a better approach than the one I took; Mine was to go charging into GLM.jl like a bull (with a pull request) in a china shop. I don’t think I broke too much china, but it’s only day 1 on the pull request. so we’ll see.


#13

Agreed! I have retouched the package. It’s now up to date with GLM v0.7.0, with proper handling of categorical variables. Check the new iris data example for one-way ANOVA.

Next, I need to add support for multiple (nested) models, and for aov. Pull requests welcome!


#14

To me, I have always seen ANOVA as more of an Experimental Design tool.

What would be nice to see is adding a really good Experimental Design Package. Such a package would include ANOVA and maybe support for setting up randomized blocking, latin squares, multi-factor designs etc.