Comparing functions with their alias in GLM.jl

yordiak · October 1, 2020, 11:30am

Not sure how clear the title is or whether this is the right place for asking the following question, but any way…

I am testing the GLM package mostly for educational purposes. In the package documentation I saw that the lm function is an alias to fit(LinearModel,...) and glm for fit(GeneralizedLinearModel). However, when I compare the two functions, I get a false statement. For instance,

using DataFrames, GLM

data = DataFrame(X=[1,2,3], Y=[1,0,1])

probit1 = glm(@formula(Y~X), data, Binomial(), ProbitLink())
probit2 =fit(GeneralizedLinearModel, @formula(Y~X), data, Binomial(), ProbitLink())

probit1 == probit2       # return false

I don’t understand why this happens (the false return)

pdeffebach · October 1, 2020, 12:06pm

Try isapprox, I guess? Maybe there are numerical differences that don’t matter qualitatively but get detected by ==

yordiak · October 1, 2020, 12:15pm

also, isequal(probit1,probit2) returns false

danielw2904 · October 1, 2020, 12:17pm

I’m not exactly sure what this comparison does but glms are maximum likelihood estimations so you would have to set a seed I think. Is this the same for lm? Of course numerical differences may still persist

yordiak · October 1, 2020, 12:19pm

I was just playing around. However, I expected a true returns, since glm is an alias of fit(GeneralizedLinearModel,...)

pdeffebach · October 1, 2020, 12:20pm

sorry I meant isapprox and edited my post

yordiak · October 1, 2020, 12:21pm

isapprox(probit1,probit2) returns an error

MethodError: no method matching isapprox(::StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Binomial{Float64},ProbitLink},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}, ::StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Binomial{Float64},ProbitLink},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}})

I am using Julia 1.5.2 on win10 64bit, GLM 1.3.10

danielw2904 · October 1, 2020, 12:22pm

yordiak:

using DataFrames, GLM

data = DataFrame(X=[1,2,3], Y=[1,0,1])

probit1 = glm(@formula(Y~X), data, Binomial(), ProbitLink())
probit2 =fit(GeneralizedLinearModel, @formula(Y~X), data, Binomial(), ProbitLink())

probit1 == probit2

Note that

julia> glm(@formula(Y~X), data, Binomial(), ProbitLink())  == glm(@formula(Y~X), data, Binomial(), ProbitLink()) 
false

danielw2904 · October 1, 2020, 12:24pm

I think the problem is with the comparison.
This works for me so the models are equivalent:

julia> coef(probit1) == coef(probit2)
true

julia> stderr(probit1) == stderr(probit2)
true

yordiak · October 1, 2020, 12:24pm

why that happens? I am asking just for curiosity

yordiak · October 1, 2020, 12:26pm

I am getting the same result. Only when I compare probit1 ==probit2 I get a false return

danielw2904 · October 1, 2020, 12:29pm

I am not the right person to answer that question with authority but basically == is implemented via multiple dispatch for different types.

e.g.

julia> typeof(probit1)
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Binomial{Float64},ProbitLink},GLM.DensePredChol{Float64,Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

so it is not clear what it does exactly for this type. One would have to check the source of the GLM package if that defines a specific ==. There is a fallback I think:

from the docs:

help?> isequal
search: isequal issetequal InverseSquareLink

  isequal(x, y)

  Similar to ==, except for the treatment of floating point numbers and of missing values. isequal treats all floating-point NaN values as equal to each other, treats -0.0 as
  unequal to 0.0, and missing as equal to missing. Always returns a Bool value.

  Implementation
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  The default implementation of isequal calls ==, so a type that does not involve floating-point values generally only needs to define ==.

  isequal is the comparison function used by hash tables (Dict). isequal(x,y) must imply that hash(x) == hash(y).

  This typically means that types for which a custom == or isequal method exists must implement a corresponding hash method (and vice versa). Collections typically implement isequal
  by calling isequal recursively on all contents.

  Scalar types generally do not need to implement isequal separate from ==, unless they represent floating-point numbers amenable to a more efficient implementation than that
  provided as a generic fallback (based on isnan, signbit, and ==).

EDIT:

Looks like the fallback is called which would not work I guess

julia> @which isequal(probit1, probit2)
isequal(x, y) in Base at operators.jl:123

nalimilan · October 3, 2020, 9:50pm

By default == falls back to ===, and I don’t think anybody cared about implementing == for GLMs. So two different fits will always be considered as different. PR welcome to implement a smarter behavior (by calling == recursively on relevant fields).

Tamas_Papp · October 4, 2020, 8:34am

I am not sure that == is a very useful operator for types like this to expose in the API. For any nontrivial data, you only get exact == if you run the exact same exercise.

Topic		Replies	Views
What is the difference between the two methods of lm on GLM.jl Statistics regression , glm	2	468	February 25, 2023
Different syntax for MixedModels? What's the difference? Statistics	2	514	January 19, 2019
Comparison formula with random effect term [StatsModels.jl] Statistics statsmodels	10	737	September 21, 2021
Discrepancy between lme4 and GLM.jl Machine Learning statistics , linear-regression	7	1082	November 1, 2022
GLM: "no method matching fit" New to Julia fit , glm	9	3372	September 10, 2019

Comparing functions with their alias in GLM.jl

Related topics