Multicollinearity and GLM

Below, I show some generated data for a linear regression. With these features (also known as variables, covariates or predictors) A, B, C, D and E, I aim to predict an outcome Y. Would the data shown below be considered multicollinear in the sense that it could become problematic for linear regressions?

I would say yes, and I read that Bayesian models can handle collinear data well, so I expected a huge difference between a Bayesian and Frequentist model. However, I compared a Bayesian to a Frequentist model and they gave the same outcomes, see the figures below. Therefore, I concluded that the Frequentist model did not have any issues with the collinearity.

Might this be because lm from GLM uses QR decomposition? Or, is my data not correlated enough? When would GLM.lm start showing huge variances as mentioned in Wasserman’s lecture notes?

Your data is not correlated enough. Multi-collinearity will only become a problem with estiamtion if its is very very high, i.e. within the margin of error for QR decomposition. A correlation of .82 is not that.

You mean between the variables? Dormann et al. (2012) talk about degraded performance from correlation coefficients between variables of |r| > 0.7. Maybe Statistics.cor is very different from r. I’ll look into that now.

EDIT: Nope that’s not it. Pearson’s r it is and Statistics.cor calculates the Pearson correlation too.

EDIT2: The correlations between the variables are definitely above 0.7:

julia> cor(df.D, df.E)

I’m not familiar with the issues studied in the paper linked, but I’ve never heard of anyone in econometrics discuss |r| > .7 being a problem.

1 Like

GitHub - ericqu/LinearRegression.jl: Linear Regression for Julia has a test for collinearity built in. Collinearity in linear regression models means that the coefficients will be estimated imprecisely. If the priors counter the particular imprecision, then Bayesian methods will help. But, if the priors don’t add information in the dimensions where it’s lacking, they won’t help much.


That makes sense. Thanks a lot!