Hi
I just thought I would give it a try with the CRRao package that we developed recently. The CRRao package can handle both simple linear regression and Bayesian linear regression with Gaussian or Ridge prior, Laplace or Lasso prior, Cauchy prior and TDist prior. I found the results came nicely as expected. Here I am trying to describe what I have done so far.
using DataFrames, CSV
using StatsBase
using Statistics
using GLM
using CRRao
df = DataFrame(CSV.File("beauty.csv"));
first(df, 5)
5×8 DataFrame
│ Row │ eval │ beauty │ female │ age │ minority │ nonenglish │ lower │ course_id │
│ │ Float64 │ Float64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │ Int64 │
├─────┼─────────┼───────────┼────────┼───────┼──────────┼────────────┼───────┼───────────┤
│ 1 │ 4.3 │ 0.201567 │ 1 │ 36 │ 1 │ 0 │ 0 │ 3 │
│ 2 │ 4.5 │ -0.826081 │ 0 │ 59 │ 0 │ 0 │ 0 │ 0 │
│ 3 │ 3.7 │ -0.660333 │ 0 │ 51 │ 0 │ 0 │ 0 │ 4 │
│ 4 │ 4.3 │ -0.766312 │ 1 │ 40 │ 0 │ 0 │ 0 │ 2 │
│ 5 │ 4.4 │ 1.42145 │ 1 │ 31 │ 0 │ 0 │ 0 │ 0 │
## Fit simple linear regression
fit = @fitmodel((eval~beauty+female+age+ minority+nonenglish+lower+course_id)
, df
, LinearRegression())
Model Class: Linear Regression
Likelihood Mode: Gauss
Link Function: Identity
Computing Method: Optimization
────────────────────────────────────────────────────────────────────────────────
Coef. Std. Error t Pr(>|t|) Lower 95% Upper 95%
────────────────────────────────────────────────────────────────────────────────
(Intercept) 4.19361 0.146437 28.64 <1e-99 3.90583 4.48138
beauty 0.140902 0.0333132 4.23 <1e-04 0.0754356 0.206369
female -0.19745 0.0528073 -3.74 0.0002 -0.301226 -0.0936735
age -0.00217404 0.00280125 -0.78 0.4381 -0.00767904 0.00333096
minority -0.0688714 0.0785761 -0.88 0.3812 -0.223288 0.0855457
nonenglish -0.274837 0.110696 -2.48 0.0134 -0.492376 -0.0572989
lower 0.098818 0.054249 1.82 0.0692 -0.00779174 0.205428
course_id -0.000390066 0.00298497 -0.13 0.8961 -0.0062561 0.00547596
────────────────────────────────────────────────────────────────────────────────
Based on the above analysis, we can say that the beauty
, female
, and nonenglish
are the three variables that have significant effect on the evaluation
. As I able to reproduce the results above, I tried the Bayesian linear regression with Ridge prior (aka. Gaussian prior) using the CRRao package.
fit_ridge_prior = @fitmodel((eval~beauty+female+age+ minority+nonenglish+lower+course_id)
, df
, LinearRegression()
, Prior_Ridge())
First I found the following MCMC chain summary statistics.
Chains MCMC chain (10000×22×1 Array{Float64, 3}):
Iterations = 1001:1:11000
Number of chains = 1
Samples per chain = 10000
Wall duration = 16.08 seconds
Compute duration = 16.08 seconds
parameters = v, σ, α, β[1], β[2], β[3], β[4], β[5], β[6], β[7]
internals = lp, n_steps, is_accept, acceptance_rate, log_density, hamiltonian_energy, hamiltonian_energy_error, max_hamiltonian_energy_error, tree_depth, numerical_error, step_size, nom_step_size
Summary Statistics
parameters mean std naive_se mcse ess rhat ess_per_sec
Symbol Float64 Float64 Float64 Float64 Float64 Float64 Float64
v 3.0633 0.9052 0.0091 0.0110 7687.9312 1.0000 478.1646
σ 0.5327 0.0178 0.0002 0.0001 12334.4027 1.0000 767.1603
α 4.1559 0.1478 0.0015 0.0018 6478.9031 0.9999 402.9670
β[1] 0.1430 0.0335 0.0003 0.0003 10478.2933 0.9999 651.7162
β[2] -0.1924 0.0523 0.0005 0.0006 9189.6967 0.9999 571.5696
β[3] -0.0015 0.0028 0.0000 0.0000 6912.4739 0.9999 429.9337
β[4] -0.0673 0.0780 0.0008 0.0007 10673.4607 0.9999 663.8550
β[5] -0.2722 0.1126 0.0011 0.0011 11336.8591 0.9999 705.1163
β[6] 0.1013 0.0549 0.0005 0.0005 10938.8474 0.9999 680.3612
β[7] -0.0005 0.0030 0.0000 0.0000 14629.2527 0.9999 909.8926
I found the rhat
statistics for all parameters are very close to 1. This means MCMC had no problem with convergence. Then I noticed the parameters. I present the 95% posterior credible interval of each parameter below:
Quantiles
parameters 2.5% 25.0% 50.0% 75.0% 97.5%
Symbol Float64 Float64 Float64 Float64 Float64
v 1.8556 2.4446 2.8946 3.4768 5.3237
σ 0.4991 0.5205 0.5323 0.5444 0.5684
α 3.8624 4.0571 4.1567 4.2567 4.4421
β[1] 0.0779 0.1202 0.1432 0.1653 0.2078
β[2] -0.2938 -0.2280 -0.1923 -0.1569 -0.0883
β[3] -0.0069 -0.0034 -0.0015 0.0004 0.0041
β[4] -0.2193 -0.1208 -0.0666 -0.0147 0.0847
β[5] -0.4950 -0.3472 -0.2733 -0.1955 -0.0533
β[6] -0.0065 0.0643 0.1012 0.1382 0.2086
β[7] -0.0063 -0.0025 -0.0005 0.0015 0.0055
The first parameter, v is the hyper-parameter, and σ is the residual standard deviation. The α is the intercept parameter, and β[1] -β[7] are regression coefficients. Based on the 95% Bayesian CI, we can say that β[1], β[2] and β[5] - these three coefficients are statistically significant, and the corresponding predictors are beauty
, female
, and nonenglish
do have a statistically significant effect over the evaluation
.
I made the following table to compare the OLS and Bayesian estimates with Gauss prior.
Predictor OLS Bayes (Gauss/Ridge prior)
(Intercept) 4.19361 4.1559
beauty 0.140902 0.1430
female -0.19745 -0.1924
age -0.0022 -0.0015
minority -0.0689 -0.0673
nonenglish -0.2748. -0.2722
lower 0.098818 0.1013
course_id -0.00039. -0.0005
The visual inspection indicates that estimates of regression coefficients by both methods are very close to each other.
I tried Laplace prior as follows:
fit_Laplace_prior = @fitmodel((eval~beauty+female+age+ minority+nonenglish+lower+course_id)
, df
, LinearRegression()
, Prior_Laplace())
The summary statistics and rhat
were:
Summary Statistics
parameters mean rhat ess_per_sec
v 1.3229 1.0000 658.6089
σ 0.5323 1.0000 1018.0162
α 4.1576 1.0000 574.6666
β[1] 0.1405 1.0000 877.7872
β[2] -0.1883 1.0000 765.7493
β[3] -0.0016 1.0000 634.6961
β[4] -0.0661 0.9999 980.9478
β[5] -0.2590 0.9999 773.6782
β[6] 0.0986 1.0004 805.5721
β[7] -0.0004 0.9999 1524.5410
The results were very similar. Similarly, I tried Cauchy prior on the regression coefficients.
fit_Cauchy_prior = @fitmodel((eval~beauty+female+age+ minority+nonenglish+lower+course_id)
, df
, LinearRegression()
, Prior_Cauchy())
Summary Statistics
parameters mean rhat ess_per_sec
σ 0.5299 0.9999 830.7017
α 4.1817 1.0000 692.7191
β[1] 0.1403 1.0000 814.6251
β[2] -0.1930 0.9999 871.6434
β[3] -0.0020 1.0001 726.4616
β[4] -0.0691 1.0003 897.4862
β[5] -0.2608 1.0001 958.6221
β[6] 0.0997 0.9999 961.6356
β[7] -0.0004 1.0001 1298.7876
I combine all the results below:
Predictor OLS Bayes Bayes Bayes
(Gauss prior). (Laplace prior). (Cauchy prior)
(Intercept) 4.19361 4.1559 4.1576 4.1817
beauty 0.140902 0.1430 0.1405 0.1403
female -0.19745 -0.1924 -0.1883 -0.1930
age -0.0022 -0.0015 -0.0016 -0.0020
minority -0.0689 -0.0673 -0.0661 -0.0691
nonenglish -0.2748. -0.2722 -0.2590 -0.2608
lower 0.098818 0.1013 0.0986 0.0997
course_id -0.00039. -0.0005 -0.0004 -0.0004
Note: Please note that for simple linear regression CRRao.jl call the GLM.jl and for Bayesian linear regression, it calls the Turing.jl
Thanks and regards,
Sourish