MWE for Abstraction object for Statistical Models needed?

Hello!

I have Julia code that implements a suite of statistical models and computes diagnostics such as log-likelihood, test statistics and of course reports regression coefficients.

I would like to plug into the abstraction for statistical models here:

Abstraction

Can someone please provide me with a minimal working example that I can modify that creates a RegressionModelAbstraction object so I can feed my results into this package here RegressionTables

The MWE would have something like

loglikelihood = -3.
param = 2
se = 0.5 

and creates a RegressionModel object that can feed into RegressionTables.

Thank you!

I am not sure I fully understand what you are asking for, it might help to see some of your code to understand what it currently looks like so it can fit into the RegressionModel type.

I think what you are asking for is a basic regression model. So first, you need to create a type for your regression model:

using StatsAPI, StatsModels, RegressionTables, RDatasets, Statistics
struct MyStatsModel <: RegressionModel
    coef::Vector{Float64}
    vcov::Matrix{Float64}
    dof::Int
    dof_residual::Int
    nobs::Int
    rss::Float64
    tss::Float64
    coefnames::Vector{String}
    responsename::String
    formula::FormulaTerm
    formula_schema::FormulaTerm
end

For your specific use case, there might be other things that you need to add to that.

You then need to match the StatsAPI elements to your new type:

StatsAPI.coef(m::MyStatsModel) = m.coef
StatsAPI.coefnames(m::MyStatsModel) = m.coefnames
StatsAPI.vcov(m::MyStatsModel) = m.vcov
StatsAPI.dof(m::MyStatsModel) = m.dof
StatsAPI.dof_residual(m::MyStatsModel) = m.dof_residual
StatsAPI.nobs(m::MyStatsModel) = m.nobs
StatsAPI.rss(m::MyStatsModel) = m.rss
StatsAPI.nulldeviance(m::MyStatsModel) = m.tss
StatsAPI.islinear(m::MyStatsModel) = true
StatsAPI.deviance(m::MyStatsModel) = StatsAPI.rss(m)
StatsAPI.mss(m::MyStatsModel) = nulldeviance(m) - StatsAPI.rss(m)
StatsModels.formula(m::MyStatsModel) = m.formula_schema

#edit add:
StatsAPI.r2(m::MyStatsModel) = StatsAPI.r2(m, :devianceratio)

(I find looking at FixedEffectModels.jl/src/FixedEffectModel.jl at master · FixedEffects/FixedEffectModels.jl (github.com) a reasonably straightforward set of how to implement this).

Then you would need a function that creates that type:

function StatsAPI.fit(::Type{MyStatsModel}, f::FormulaTerm, df::DataFrame)
    df = dropmissing(df)
    f_schema = apply_schema(f, schema(f, df))
    y, X = modelcols(f_schema, df)
    response_name, coefnames_exo = coefnames(f_schema)
    n, p = size(X)
    β = X \ y
    ŷ = X * β
    res = y - ŷ
    rss = sum(abs2, res)
    tss = sum(abs2, y .- mean(y))
    dof = p
    dof_residual = n - p
    vcov = inv(X'X) * rss / dof_residual
    MyStatsModel(β, vcov, dof, dof_residual, n, rss, tss, coefnames_exo, response_name, f, f_schema)
end

This will work with RegressionTables

rr1 = StatsAPI.fit(MyStatsModel, @formula(Sales ~ 1 + Price + NDI), df)
rr2 = StatsAPI.fit(MyStatsModel, @formula(Sales ~ 1 + Price), df)
rr3 = StatsAPI.fit(MyStatsModel, @formula(Sales ~ 1 + NDI), df)
regtable(rr1, rr2, rr3)
--------------------------------------------------
                              Sales
              ------------------------------------
                     (1)          (2)          (3)
--------------------------------------------------
(Intercept)   138.480***   139.734***   132.981***
                 (1.427)      (1.521)      (1.538)
Price          -0.938***    -0.230***
                 (0.054)      (0.019)
NDI             0.007***                 -0.001***
                 (0.000)                   (0.000)
--------------------------------------------------
N                  1,380        1,380        1,380
R2                 0.209        0.097        0.034
--------------------------------------------------
3 Likes

Thank you @junder873 that is exactly what I was looking for.

Do you know by any chance how to add custom coefficients, for example, if some parameters are held fixed (and hence have no standard error) or would that go too deep into the RegressionTables API?

It would definitely be more into the RegressionTables API. There are a few options:

One option would be to adjust how AbstractUnderStatistic is displayed. So if one of the standard errors is 0, instead of displaying (0.000) have it return something else:

function Base.repr(render::AbstractRenderType, x::RegressionTables.AbstractUnderStatistic; digits=RegressionTables.default_digits(render, x), args...)
    if RegressionTables.value(x) == 0
        repr(render, "Was 0")# fill in with missing or anything else, missing will display a blank by default
    else
        RegressionTables.below_decoration(render, repr(render, RegressionTables.value(x); digits, commas=false, args...))
    end
end
rr1.vcov[1, 1] = 0 #just to force the 0 value
tab = regtable(rr1, rr2, rr3)
--------------------------------------------------
                              Sales
              ------------------------------------
                     (1)          (2)          (3)
--------------------------------------------------
(Intercept)   138.480***   139.734***   132.981***
                   Was 0      (1.521)      (1.538)
Price          -0.938***    -0.230***
                 (0.054)      (0.019)
NDI             0.007***                 -0.001***
                 (0.000)                   (0.000)
--------------------------------------------------
N                  1,380        1,380        1,380
R2                 0.209        0.097        0.034
--------------------------------------------------

The second option, which is very after the fact, would be to edit the table:

tab[4, 2] = "New"
tab
--------------------------------------------------
                              Sales
              ------------------------------------
                     (1)          (2)          (3)
--------------------------------------------------
(Intercept)   138.480***   139.734***   132.981***
                     New      (1.521)      (1.538)
Price          -0.938***    -0.230***
                 (0.054)      (0.019)
NDI             0.007***                 -0.001***
                 (0.000)                   (0.000)
--------------------------------------------------
N                  1,380        1,380        1,380
R2                 0.209        0.097        0.034
--------------------------------------------------

The final option I can think of is to change how an entire row is displayed, but this only works if the fixed parameters are always on the same row. You can edit the StdError function:

function RegressionTables.StdError(rr::MyStatsModel, k::Int; standardize::Bool=false, vargs...)
    if k == 1
        missing
    elseif standardize
        StdError(standardize_coef_values(rr, _stderror(rr)[k], k))
    else
        StdError(RegressionTables._stderror(rr)[k])
    end
end

rr1 = StatsAPI.fit(MyStatsModel, @formula(Sales ~ 1 + Price + NDI), df)
rr2 = StatsAPI.fit(MyStatsModel, @formula(Sales ~ 1 + Price), df)
rr3 = StatsAPI.fit(MyStatsModel, @formula(Sales ~ 1 + NDI), df)
tab = regtable(rr1, rr2, rr3)
--------------------------------------------------
                              Sales
              ------------------------------------
                     (1)          (2)          (3)
--------------------------------------------------
(Intercept)   138.480***   139.734***   132.981***

Price          -0.938***    -0.230***
                 (0.054)      (0.019)
NDI             0.007***                 -0.001***
                 (0.000)                   (0.000)
--------------------------------------------------
N                  1,380        1,380        1,380
R2                 0.209        0.097        0.034
--------------------------------------------------

If you want to get rid of the empty row above, you can modify the RegressionTables object:

tab.data = tab.data[Not(4)]
tab
--------------------------------------------------
                              Sales
              ------------------------------------
                     (1)          (2)          (3)
--------------------------------------------------
(Intercept)   138.480***   139.734***   132.981***
Price          -0.938***    -0.230***
                 (0.054)      (0.019)
NDI             0.007***                 -0.001***
                 (0.000)                   (0.000)
N                  1,380        1,380        1,380
--------------------------------------------------
R2                 0.209        0.097        0.034
--------------------------------------------------

Thank you @junder873 that is wonderful!

Now, regarding the code in the original solution further above, how would I run it? I executed it all in a notebook cell and added this line:

StatsAPI.responsename(m::MyStatsModel) = m.responsename

However, how can I carry through the R^2 value? Mine is empty for some reason.

And re: the extra bit about restricted parameters, that is absolutely fantastic! Thank you so much!

Realized I forgot to add that line back in after I did a small test. Add:

StatsAPI.r2(m::MyStatsModel) = StatsAPI.r2(m, :devianceratio)

And it should work for R^2

In short, the underlying StatsAPI function needs to be defined for the regression statistic to be displayed. So, as in https://github.com/FixedEffects/FixedEffectModels.jl/blob/master/src/FixedEffectModel.jl:

StatsAPI.loglikelihood(m::MyStatsModel) = # something

Will make the log likelihood ratio work in RegressionTables

1 Like

@junder873

Thank you again for your generous responses. However, I am seeing some unexpected behaviour when I run your code on my installation (Julia 10.2). Namely, I see Estimator OLS and the RegressionTable package insists I supply the variable islinear.

Do you know a way to control the output, re: the estimator name (which I don’t need for our paper) and the ability to suppress all regression statistics? Mine would be handled by extralines anyway.

Once I am done writing this up, I’d be happy to contribute this as a MWE for people using this using their custom methods.

Thank you!

islinear and RegressionType

islinear is part of the StatsAPI, which is why it is implemented. You can actually avoid it by instead defining RegressionType for your type:

RegressionTables.RegressionType(rr::MyStatsModel) = RegressionTables.RegressionType("My type")

The reason you are seeing the current behavior is line 142 of RegressionTables.jl/src/regressionResults.jl at master · jmboehm/RegressionTables.jl (github.com):

RegressionType(x::RegressionModel) = islinear(x) ? RegressionType(Normal()) : RegressionType("NL")

So you would basically override that. You can also suppress that line of the result by setting regtable(...; print_estimator=false or for your entire session with RegressionTables.default_print_estimator(render::AbstractRenderType, rrs) = false.

Regression Summary Statistics

In general, it might be worthwhile to look through the extensions on the RegressionTables package. Each of those is written to modify the default table results. For example, if you do not want any regression statistics for your data, you can run:

RegressionTables.default_regression_statistics(rr::MyStatsModel) = Symbol[]

However, note that if you are running multiple types of regressions in one table (e.g., GLM + MyStatsModel), then the default GLM statistics will force the printing of those statistics anyway.

Another alternative is you can remove (or change the order) of sections as you want by changing the section_order argument: regtable(...; section_order=...) or the default:

RegressionTables.default_section_order(render::AbstractRenderType) = [:groups, :depvar, :number_regressions, :break, :coef, :break, :fe, :break, :randomeffects, :break, :clusters, :break, :regtype, :break, :controls, :break, :stats, :extralines]

So if you do not want the RegressionType or statistics, remove :regtype and :stats from that list.

A Suggestion Based on What I think you are doing

The underlying design of RegressionTables is that extralines is more of a last resort. So, in addition to changing the RegressionType and default_regression_statistics to fit your needs, there are two other ways to implement custom statistics: 1) define other statistics and 2) use other_stats

Just to modify what I did before a little, add to the struct a special_value:

    formula_schema::FormulaTerm
    special_value::String
end

and to the function something to that:

function StatsAPI.fit(::Type{MyStatsModel}, f::FormulaTerm, df::DataFrame; special_value::String="ABC")
...
    MyStatsModel(β, vcov, dof, dof_residual, n, rss, tss, coefnames_exo, response_name, f, f_schema, special_value)
end

And just to give an example of defining your own RegressionType:

RegressionTables.RegressionType(rr::MyStatsModel) = RegressionType("MyModel")

Defining New Statistics

It is possible to define new RegressionStatistics in RegressionTables, these will appear alongside other statistics:

struct MyStatistic <: RegressionTables.AbstractRegressionStatistic
    val::Union{String, Nothing}
end
MyStatistic(rr::RegressionModel) = MyStatistic(nothing)
MyStatistic(rr::MyStatsModel) = MyStatistic(rr.special_value)
RegressionTables.label(render::AbstractRenderType, x::Type{MyStatistic}) = "My Statistic"
RegressionTables.default_regression_statistics(rr::MyStatsModel) = [Nobs, MyStatistic]

using FixedEffectModels
rr1 = StatsAPI.fit(MyStatsModel, @formula(Sales ~ 1 + Price + NDI), df)
rr2 = StatsAPI.fit(MyStatsModel, @formula(Sales ~ 1 + Price), df; special_value="123")
rr3 = StatsAPI.fit(MyStatsModel, @formula(Sales ~ 1 + NDI), df; special_value="ABC")
rr4 = reg(df, @formula(Sales ~ Price + NDI))
tab = regtable(rr1, rr2, rr3, rr4)

----------------------------------------------------------------
                                     Sales
               -------------------------------------------------
                      (1)          (2)          (3)          (4)
----------------------------------------------------------------
(Intercept)    138.480***   139.734***   132.981***   138.480***
                  (1.427)      (1.521)      (1.538)      (1.427)
Price           -0.938***    -0.230***                 -0.938***
                  (0.054)      (0.019)                   (0.054)
NDI              0.007***                 -0.001***     0.007***
                  (0.000)                   (0.000)      (0.000)
----------------------------------------------------------------
Estimator         MyModel      MyModel      MyModel          OLS
----------------------------------------------------------------
N                   1,380        1,380        1,380        1,380
My Statistic          ABC          123          ABC
R2                  0.209        0.097        0.034        0.209
----------------------------------------------------------------

Using other_stats

The other_stats function is useful for defining a new section. It is how FixedEffectModels.jl and MixedModels.jl implement necessary pieces. You need to create an other_stats function to fit your needs and then add whatever symbol you created to default_section_order:

function RegressionTables.other_stats(rr::MyStatsModel, s::Symbol)
    if s == :my_stat
        return ["Some Descriptor" => MyStatistic(rr)]
    else
        return nothing
    end
end
RegressionTables.default_section_order(render::AbstractRenderType) = [:groups, :depvar, :number_regressions, :break, :coef, :break, :fe, :break, :randomeffects, :break, :clusters, :break, :regtype, :break, :controls, :break, :stats, :break, :my_stat, :extralines]

rr1 = StatsAPI.fit(MyStatsModel, @formula(Sales ~ 1 + Price + NDI), df)
rr2 = StatsAPI.fit(MyStatsModel, @formula(Sales ~ 1 + Price), df; special_value="123")
rr3 = StatsAPI.fit(MyStatsModel, @formula(Sales ~ 1 + NDI), df; special_value="ABC")
rr4 = reg(df, @formula(Sales ~ Price + NDI))
tab = regtable(rr1, rr2, rr3, rr4)

-------------------------------------------------------------------
                                        Sales
                  -------------------------------------------------
                         (1)          (2)          (3)          (4)
-------------------------------------------------------------------
(Intercept)       138.480***   139.734***   132.981***   138.480***
                     (1.427)      (1.521)      (1.538)      (1.427)
Price              -0.938***    -0.230***                 -0.938***
                     (0.054)      (0.019)                   (0.054)
NDI                 0.007***                 -0.001***     0.007***
                     (0.000)                   (0.000)      (0.000)
-------------------------------------------------------------------
Estimator            MyModel      MyModel      MyModel          OLS
-------------------------------------------------------------------
N                      1,380        1,380        1,380        1,380
R2                     0.209        0.097        0.034        0.209
-------------------------------------------------------------------
Some Descriptor          ABC          123          ABC
-------------------------------------------------------------------

Conveniently, other_stats can be used to create as many new sections as you want, so if you need those broken out, that is possible.

1 Like

@junder873 That is a brilliant suggestion regarding the custom statistics.

Unfortunately, I am getting this error

MethodError: no method matching display_val(::typeof(MyStatistic))

Closest candidates are:
  display_val(!Matched::Type)
   @ RegressionTables ~/.julia/packages/RegressionTables/VhnTe/src/regtable.jl:785
  display_val(!Matched::Pair)
   @ RegressionTables ~/.julia/packages/RegressionTables/VhnTe/src/regtable.jl:784

Is there perhaps a line missing that defines display_val ?