Auxiliary GLM Statistics

Julia 1.0. GLM 1.0.0.

My Basic Example

julia> using GLM, StatsBase

julia> y=[1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); lm1= GLM.lm(X, y);

Basic Questions

  1. is the sigma (i.e., sqrt(deviance(lm)/dof_residual(lm)) already programmed into julia?

  2. is the f-statistic of the model (i.e., whether the model beats a constant) already available somewhere ?

My motivation is to try to write a function that prints regression output in a different format that I prefer. At first, I thought I would write a show function, but then I figured that the spirit of julia is for the user not to replace the basics, but to write different functions…or maybe add a particular kind of keyword that dispatches to my function?

Less Important Question on ftest

  1. how does ftest work? The example from the docs does not work:
julia> using DataFrames, GLM, StatsBase

julia> dat = DataFrame(Treatment=[1, 1, 1, 2, 2, 2, 1, 1, 1, 2, 2, 2.],
                                Result=[1.1, 1.2, 1, 2.2, 1.9, 2, .9, 1, 1, 2.2, 2, 2],
                                Other=[1, 1, 2, 1, 2, 1, 3, 1, 1, 2, 2, 1]);

julia> using StatsModels

julia> mod = lm(@formula(Result ~ 1 + Treatment), dat);
ERROR: cannot assign variable Base.mod from module Main
Stacktrace:
 [1] top-level scope at none:0

julia> mod = lm(@formula(Result ~ 1 + Treatment), dat);
ERROR: cannot assign variable Base.mod from module Main
Stacktrace:
 [1] top-level scope at none:0

so, I tried to make up my own simpler example and without DataFrame:

julia> using GLM, StatsBase

julia> y=[1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 );

julia> y=[1:10;] ; XX= hcat( fill(1,10), y.^2, y.^3, y.^4 );

julia> lm1= GLM.lm(X, y); lm2= GLM.lm(XX, y);

julia> ftest(lm2, lm1)
Error showing value of type GLM.FTestResult{2}:
ERROR: MethodError: no method matching Array{String,2}(::Int64, ::Int64)
Closest candidates are:
  Array{String,2}(::UndefInitializer, ::Int64, ::Int64) where T at boot.jl:396
  Array{String,2}(::UndefInitializer, ::Int64...) where {T, N} at boot.jl:400
  Array{String,2}(::UndefInitializer, ::Integer, ::Integer) where T at sysimg.jl:143
  ...
Stacktrace:
 [1] show(::IOContext{REPL.Terminals.TTYTerminal}, ::GLM.FTestResult{2}) at /Users/ivo/.julia/packages/GLM/8TwVM/src/ftest.jl:110

Prediction?

in R, one gets a predicted value like this. How do I do it in this simple example?

julia> using DataFrames, GLM

julia> y=[1:10;] ; X= hcat( fill(1,10), y.^2, y.^3 ); lm1= GLM.lm(X, y);

julia> predict( lm1, [ 1.0, 2.0, 3.0 ])
ERROR: MethodError: no method matching predict(::LinearModel{LmResp{Array{Float64,1}},DensePredChol{Float64,LinearAlgebra.Cholesky{Float64,Array{Float64,2}}}}, ::Array{Float64,1})

pointers appreciated.

1 Like

To me this is illustrative of the paradox currently existing in Julia: you can do cutting-edge super crazy complex stuff but there is currently no straightforward function to perform a Pearson’s correlation test :sweat_smile: I believe the implementation of these “basic” level statistics (including GLMs fitting) will be necessary for Julia’s adoption to really explode (there are fields with millions of researchers almost exclusively (but intensively) using LMs or ANOVAs).

With that being said, after trying to implement some helper functions for GLM’s, I similarly found it fairly uneasy to deal with the structures and extract things from the models :frowning_face:

I believe this is gonna change with time as they are refactoring things (such as the formula system) but we might have to be a bit more patient.

Sorry for not answering your question!

1 Like

This is a particularly unfortunate example, since there are at least 5 ways to perform frequentist inference on Pearson’s \rho, depending on the assumptions you want to make.

If you believe that there is a “straightforward” way to do something just because a particular language/framework that you have been using has that implemented as the default, you may expect to be disappointed with the Julia package ecosystem, as it is common to implement things in a modular way, giving the user a lot of choices. If, however, you understand what you are doing, you may find the latter very convenient.

It’s predict(lm1, [1 2 3]). The input you supply must be the same type as that used to fit the model. In this case, X is an array of integers, so you need to use integers for the values you supply for the out of sample fit.

More info, from ?predict
Form the predicted response of model obj. An object with new covariate values newX can be supplied, which should have the same type and structure as that used to fit obj; e.g. for a GLM it would generally be a DataFrame with the same variable names as the original predictors.

Regarding methods and functions, I suspect that the statistics/data science environment will become more complete and perhaps somewhat more homogeneous now that Julia 1.0 is here. I will be looking to use DataFrames, StatsModels, etc. more in my own work.

1 Like

You’re absolutely right! By “straightforward” I meant R-like with R-like defaults (the language which OP was referring to), which clearly might not be the only / best / recommended way of doing things. However, a “default” implementation is usually how people (at least in my field) start with statistics and programming, by running cor.test(x, y) and then (hopefully) wondering: “wait a sec, what does this parameter / function actually do?”

1 Like

Good catch. See Fix ftest printing by nalimilan ¡ Pull Request #252 ¡ JuliaStats/GLM.jl ¡ GitHub.

1 Like

thx, nalimilan. now I get the same error in both cases.

julia> ft = ftest(model.model, nullmodel.model)
Error showing value of type GLM.FTestResult{2}:
ERROR: MethodError: no method matching Array{String,2}(::Int64, ::Int64)
Closest candidates are:
  Array{String,2}(::UndefInitializer, ::Int64, ::Int64) where T at boot.jl:396
  Array{String,2}(::UndefInitializer, ::Int64...) where {T, N} at boot.jl:400
  Array{String,2}(::UndefInitializer, ::Integer, ::Integer) where T at sysimg.jl:143
  ...

regards, /iaw

PS: (thanks mcreel for the predict info. and is the regression stderr of the resids and overall F-test really not provided? I am not asking inference. I am asking about basic stats.)

Hi All, I am a latecomer to this discussion (maybe it is all obsolete now), but I found a handy workaround for ftest, given that the ‘r2’ function is there:

ftest(lmodel)=(1/(1-r2(lmodel))-1)*(length(predict(lmodel))- 
    length(coef(lmodel)))/(length(coef(lmodel))-1)

It would be cool if this could be implemented, as I use this interface for my introductory stats classes.

Feel free to make a pull request to implement a single-argument ftest method. But it should probably return the same kind of table as the multiple argument method (not a single number).

Thanks. I tried to, but unfortunately I don’t know how to do this (I got bogged down with ‘branch comparisons’). I take your point about the name … maybe call it ‘fstat’ because it is a standard diagnostic, or else make it an allowable method for ftest)