Comparing functions with their alias in GLM.jl

Not sure how clear the title is or whether this is the right place for asking the following question, but any way… :slight_smile:

I am testing the GLM package mostly for educational purposes. In the package documentation I saw that the lm function is an alias to fit(LinearModel,...) and glm for fit(GeneralizedLinearModel). However, when I compare the two functions, I get a false statement. For instance,

using DataFrames, GLM

data = DataFrame(X=[1,2,3], Y=[1,0,1])

probit1 = glm(@formula(Y~X), data, Binomial(), ProbitLink())
probit2 =fit(GeneralizedLinearModel, @formula(Y~X), data, Binomial(), ProbitLink())

probit1 == probit2       # return false

I don’t understand why this happens (the false return)

Try isapprox, I guess? Maybe there are numerical differences that don’t matter qualitatively but get detected by ==

also, isequal(probit1,probit2) returns false

I’m not exactly sure what this comparison does but glms are maximum likelihood estimations so you would have to set a seed I think. Is this the same for lm? Of course numerical differences may still persist

I was just playing around. However, I expected a true returns, since glm is an alias of fit(GeneralizedLinearModel,...)

sorry I meant isapprox and edited my post

isapprox(probit1,probit2) returns an error

MethodError: no method matching isapprox(::StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Binomial{Float64},ProbitLink},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}, ::StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Binomial{Float64},ProbitLink},GLM.DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}})

I am using Julia 1.5.2 on win10 64bit, GLM 1.3.10

Note that

julia> glm(@formula(Y~X), data, Binomial(), ProbitLink())  == glm(@formula(Y~X), data, Binomial(), ProbitLink()) 
false

I think the problem is with the comparison.
This works for me so the models are equivalent:

julia> coef(probit1) == coef(probit2)
true

julia> stderr(probit1) == stderr(probit2)
true

why that happens? I am asking just for curiosity

I am getting the same result. Only when I compare probit1 ==probit2 I get a false return

I am not the right person to answer that question with authority but basically == is implemented via multiple dispatch for different types.

e.g.

julia> typeof(probit1)
StatsModels.TableRegressionModel{GeneralizedLinearModel{GLM.GlmResp{Array{Float64,1},Binomial{Float64},ProbitLink},GLM.DensePredChol{Float64,Cholesky{Float64,Array{Float64,2}}}},Array{Float64,2}}

so it is not clear what it does exactly for this type. One would have to check the source of the GLM package if that defines a specific ==. There is a fallback I think:

from the docs:

help?> isequal
search: isequal issetequal InverseSquareLink

  isequal(x, y)

  Similar to ==, except for the treatment of floating point numbers and of missing values. isequal treats all floating-point NaN values as equal to each other, treats -0.0 as
  unequal to 0.0, and missing as equal to missing. Always returns a Bool value.

  Implementation
  ≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡≡

  The default implementation of isequal calls ==, so a type that does not involve floating-point values generally only needs to define ==.

  isequal is the comparison function used by hash tables (Dict). isequal(x,y) must imply that hash(x) == hash(y).

  This typically means that types for which a custom == or isequal method exists must implement a corresponding hash method (and vice versa). Collections typically implement isequal
  by calling isequal recursively on all contents.

  Scalar types generally do not need to implement isequal separate from ==, unless they represent floating-point numbers amenable to a more efficient implementation than that
  provided as a generic fallback (based on isnan, signbit, and ==).

EDIT:

Looks like the fallback is called which would not work I guess

julia> @which isequal(probit1, probit2)
isequal(x, y) in Base at operators.jl:123

By default == falls back to ===, and I don’t think anybody cared about implementing == for GLMs. So two different fits will always be considered as different. PR welcome to implement a smarter behavior (by calling == recursively on relevant fields).

I am not sure that == is a very useful operator for types like this to expose in the API. For any nontrivial data, you only get exact == if you run the exact same exercise.