[ANN] Linear Regression v0.7-alpha

I would like to inform you that a new package about linear regression is now available.
I was needing some extra statistics and did eventually coded them for my purpose. I thought this could benefit others as well, so I cleaned up the code. In the end, the result goes in a slightly different route than the official GLM package (more on that in the readme). Let me know if things are useful, not working, or your feedback about the package through Github’s issue, or directly in the comments below.

Cheers,

12 Likes

One of the things that I would like to see in a linear regression package is the possibility to use different robust estimators for the covariance of the parameter estimates depending on possible heteroskedasticity or autocorrelation of the errors (i.e., White or Newey-West standard errors, for example, as are widely used in the econometrics literature. There are many variants.). I have a function that does this, if you might be interested in taking out bits of it: https://github.com/mcreel/Econometrics/blob/master/src/LinearRegression/ols.jl

2 Likes

FixedEffectModels uses

https://github.com/FixedEffects/Vcov.jl

and there’s also

https://github.com/gragusa/CovarianceMatrices.jl

4 Likes

Hi @mcreel, thanks for your suggestion.
Do the packages suggested by @nilshg are already sufficient for your needs? or do you prefer to have the features incorporated into a single package?

Hi, @Eric. Yes, those packages offer what I am suggesting. It would be nice if your package could use one or perhaps both of them to offer the user some choices about how to compute standard errors.

Sounds good to me so far.
Would you mind opening an issue on Github about it (that would help me keep track of it)? Ideally, if you could write down how you wish to update the API to include these features, it will be helpful. Also, if you have a test case I could use, that would be great.

GREAT! :+1:

is removing intercept that much difficult?

Hi Eric, I filed an issue.

1 Like

Thank you @mcreel! I tagged it as an enhancement and will look into it in the coming days; I will get back to you through the open issue if I have a question.

Thank you @xinchin!

Well, the formula processing was not trivial enough for me :sweat_smile:. But if you know or could point me to a resource that explains how to do it. I would consider it.
Updating the formula manually is rather straightforward, though (add a +1) and clarifies the model’s intention. Is there a particular use case in which this would be a blocking issue?

I don’t have smallest clue about that :sweat_smile: but your approach was very interesting and I hope you could add more features to your package. BTW those plots in readme are fantastic :+1:

Thank you for your kind words; I would like to let you know that I released a new version that makes it possible to have no intercept. In the same way as GLM, an intercept is implicitly added if the user did not specify an intercept.
If you need some other features, please let me know.

1 Like

@mcreel, I wanted to let you know that I added the features in version 0.71. For now, I did not use the packages mentioned by @nilshg because I could not interface with CovarianceMAtriaces.jl (I raised an issue; however, I am not sure how long it will take to resolve it).
So the HC0, HC1, HC2, and the HAC Newey-West estimators are now available.

2 Likes

I wanted to inform you that version 0.72 has been released.
It adds the following features:

  • You can now easily create some basic plots simply. Such as the fit plot, the residuals plots, QQ- plot, scale and location plot, etc.
  • Weighted regression is now possible; add the weights in the dataframe and indicate which column contains the weights.
  • Contrasts can now be added (in the same way as GLM).

Unfortunately, documentation is still a work in progress; I hope the example in the readme can get you started.
As before, if there is any question, or if you spot a bug, please let me know.

6 Likes

To let you know that version 0.73 is now released.

  • There is now a documentation
  • Cross-validation features have been added: K-fold cross-validation and the PRESS statistic at the model level.
  • Addition of several model statistics: Type 1 and 2 SS, squared partial correlation coefficients based on Type 1 SS and Type 2 SS, squared semi-partial correlation coefficients based on Type 1 and 2 SS.

Let me know if there is any question or bug.

6 Likes

I was asked to register the package to make it easier to integrate. However, there is already another LinearRegression package (https://github.com/st--/LinearRegression.jl) which focuses more on a more barebone linear regression.
I am guessing that the ones reading that thread are also the users, hence I would like to ask you to give your opinion about the following options. If you would like another option(s) else please comment below.

  • LinearRegressionAnalysis
  • OrdinaryLeastSquares
  • LinearRegressionKit
  • Something else (please comment)

0 voters

For the first option, I added “Analysis” as the purpose of the multiple statistics and the plots are to help the analysis being conducted.
For the second one, it is a reference to the method used to estimate the coefficients.

For what it’s worth, I’m fairly sure the General registry allows registering multiple packages with the same name.

My suggestion is to combine the two packages. The package linked to appears to focus on providing several methods to compute regression coefficients when models are possibly ill-conditioned. Your package could incorporate that, if it doesn’t already have it.

1 Like

@non-Jedi, can you share how to do it? I already asked the project but the information I obtained was that it is not possible.

@mcreel, yes, the QR decomposition is better for an ill-conditioned matrix. However, integrating the QR decomposition is not straightforward as the code relies on some of the by-products of the sweep-operator (the Sum of Squared Errors) to compute the other statistics. So if I use different methods, I will have to calculate the statistics differently.

Nevertheless, Cholesky and QR decompositions appear to be better for handling ill-conditioned problems only up to a certain severity.
At the moment, I do not know several things:

  • If there is a class of problem that can’t be solved by say Cholesky but could be solved by QR.
  • Which problems the Penrose-inverse can solve, and how does it compare towards Cholesky or QR?
  • In which situations it is necessary to use some regularization techniques (such as Tikhonov).

Hence for now, I am thinking to add the condition number (from Linear Algebra) so that it is easier for the user to understand if the problem is indeed ill-conditioned and how severely.

I will also ask the community in a different post their views on this topic, as I am not an expert here.

1 Like