ANOVA Tests in Julia?

BioTurboNick · April 12, 2021, 4:06pm

Since people keep discovering this post, I think the only two packages worth talking about here are now:

marcpabst / ANOVA.jl requires DataFrames, external setup and fitting of linear model (GLM.jl), and I think is only fixed factors, with “type” argument referring to variant ways to calculate the sums of squares. I don’t know if there’s any way to do nested factors with linear models. I’m assuming robust to unbalanced data due to reliance on regression?, updated 8 months ago

and mine:

BioTurboNick / SimpleANOVA.jl which does a non-GLM-requiring calculation, does random effects, nesting and repeated measures, can use DataFrames or vectors or multidimensional arrays, but requires balanced data and only does one type of sum of squares. Also can do contrasts and omega-square effect size.

babaq · December 31, 2021, 3:43am

I’ve used the ANOVA successfully, but the package seems not maintained anymore, and incompatible with some of the packages.

There is another package MixedAnova, so i am curious that if anyone tried it, and the overall experience of using it.

PharmCat · July 23, 2022, 9:43am

Is really no any actual package for ANOVA?

maxkapur · July 23, 2022, 1:59pm

HypothesisTests.jl has one-way ANOVA now:

https://juliastats.org/HypothesisTests.jl/stable/parametric/#One-way-ANOVA-Test

And GLM.jl’s ftest model claims to be usable for ANOVA:

https://juliastats.org/GLM.jl/stable/api/#GLM.ftest

I think these are both basic cases, however, and I will leave it to those who know more about statistics to speak to the availability of more general tools.

EvoArt · July 23, 2022, 9:54pm

PharmCat · July 24, 2022, 8:30pm

It seems work. But MixedAnova.jl should be intalled. I think there is ssue with package installation.

BioTurboNick · August 10, 2022, 1:26pm

Not sure how you define “actual package” but mine does many different variants of ANOVA: https://github.com/BioTurboNick/SimpleANOVA.jl

dlakelan · August 10, 2022, 2:23pm

@BioTurboNick having just stumbled on this thread because it popped up due to recent comments, I see you implemented this but your background is in biology and not mathematics etc. My biggest concern is that calculations of sums of squares are numerically unstable and it’s easily possible to get wrong answers due to numerical computation issues. How much were you aware of numerical stability issues when coding your package?

BioTurboNick · August 10, 2022, 2:33pm

At the time I wrote it, nearly zero awareness of numerical stability I must admit.

I have recently had a headlong crash-course in dealing with numerical stability in another context, so it would be good for me to revisit the package with that in mind.

PharmCat · August 10, 2022, 9:42pm

Hi! Good point! But what about var from StatsBase? This is really just sum of squares… Do we need adjust StatsBase? What kind of algorithm could be prefered? (Welford? Weighted incremental? )

##### General central moment
function _moment2(v::RealArray, m::Real; corrected=false)
    n = length(v)
    s = 0.0
    for i = 1:n
        @inbounds z = v[i] - m
        s += z * z
    end
    varcorrection(n, corrected) * s
end

dlakelan · August 11, 2022, 4:45am

Seems like a reasonable choice.

jbrea · August 11, 2022, 6:50am

Let me know, if I can contribute in some way, when you revisit your package. My implementation of R’s aov shouldn’t suffer from numerical instability.

PharmCat · August 11, 2022, 10:29am

Do we really have problems with this?

Seems var work… Can you kindly provide more robust test?

using StatsBase

v = rand(1000000)
v2 = v .+ 1000_000_000
v3 = append!(copy(v2), v2)
tv = rand(8)

function var2(v)
    M  = v[1]
    S  = 0.0
    N = length(v)
    for k = 2:N
    x = v[k]
    Mn = M
    M += (x - M) / k
    S += (x - M) * (x - Mn)
    end
  return S / (N - 1)
end

function var3(v)
    M  = v[1]
    S  = 0.0
    N = length(v)
    for k = 2:N
        x  = v[k]
        Mn = M
        M  = Mn + (x - Mn) / k
        S  = S + ((x - Mn) * (x - M) - S) / k
    end
  return S * N / (N - 1)
end

println("check: ", isapprox(var(tv), var2(tv)), " ", isapprox(var(tv), var3(tv)))
println("var: ", var(v) - var(v2), " ", var(v) - var(v3))
println("var2: ", var(v) - var2(v2), " ", var(v) - var2(v3))
println("var3: ", var(v) - var3(v2), " ", var(v) - var3(v3))

check: true true
var: 2.2700591406632498e-12 4.16894628540998e-8
var2: 1.3641951304710354e-8 6.139953312445101e-8
var3: 1.3641966167821096e-8 6.1399576617438e-8

BioTurboNick · August 11, 2022, 2:02pm

I appreciate the offer! I did some checking and made PR.

https://github.com/BioTurboNick/SimpleANOVA.jl/pull/23

It would be great if you could see if what I did makes any sense, or if I’m missing other sources of instability.

dlakelan · August 11, 2022, 2:11pm

Just some thoughts… Haven’t tested yet.

Try sampling uniform on 0 to max float value. Try sampling from a t dist with 2 degrees of freedom. Try sampling from a t dist with 2 dof and center at half max float. Try sampling from a distribution where variance is something like 4x machine epsilon for center of 1/16 max float…

PharmCat · August 11, 2022, 5:58pm

I made some experiments, and in the worst case when variable is exp.(rand(Uniform(1,log(prevfloat(typemax(Float64)))), 10000000)) we have Inf for var and var2 ; NaN for var3.

Try sampling from a distribution where variance is something like 4x machine epsilon for center of 1/16 max float

For this, I have Inf for var and 0.0 for var2 and var3

So for var2 and var3 - these are appropriate results, but Inf for var is discussible.

dlakelan · August 11, 2022, 7:23pm

Since var is giving things of scale x^2, try something like:

var(exp.(rand(Uniform(1,log(prevfloat(typemax(Float64)))/2.1), 10000000)))

Topic		Replies	Views
Poll: Do we Julians want ANOVAs? Statistics	14	3624	December 18, 2018
ANOVA and TukeyHSD Statistics	4	838	November 4, 2022
Unbalanced One-way ANOVA Statistics question	4	536	October 12, 2021
SimpleANOVA.jl 0.7.0 - Added repeated measures Package Announcements	0	371	January 12, 2021
[ANN] SASLMR.jl - ANOVA Type I/II/III (sasLM wrapper) Package Announcements package , statistics , anova	1	271	May 8, 2024

ANOVA Tests in Julia?

Related topics