For OLS,
y = Vector{Float64}(1:10)
X = Matrix{Float64}(hcat(y.^2, y.^3))
β = X \ y # Uses QR
The \
operator is the short-hand for
Q, R = qr(X)
β = inv(factorize(R)) * Q.'y
For even higher precision, you may use the singular values decomposition,
β = pinv(X) * y
or its long-form
U, S, V = svd(X)
β = V * diagm(1 ./ S) * U.' * y
Other alternatives using the LinearAlgebra stdlib include taking the inverse of the normal matrix through Cholesky
or LdLt
, Bunch-Kaufman
, etc. Less common in certain applications a few packages provide sparse iterative solvers such as LSQR and LSMR.
Just because something is not in Base, it doesn’t mean is not “blessed”. Base aims to only include the very foundation and basic building blocks (e.g., LinearAlgebra is not in Base, but a standard library). Standard libraries are the basic code that is deemed essential for the language, development or meant to be shared across the whole ecosystem (Missings.jl is probably going to be added as a standard library).
GLM doesn’t use DataFrames
per se, but uses StatsModels
meaning that if StatsModels
eventually plays well with other tabular data packages, no reason for GLM not to support them as well. Same for all other regression packages which rely on StatsModels
. The usual pipeline is StatsBase
/ StatsModels
/ some regression package (with most having a dependency on DataFrames
and Distributions
as well).
The idea of using regression packages rather than just linear algebra is that it allows for efficient and proper handling of cases. For example, contrasts (interactions), weights, linear dependent variables, different estimators (fixed effects, first-difference, between, 2SLS, random effects, regularization, generalized method of moments, etc.), displaying the variable names, computing various variance-covariance estimates, etc. Regression packages can also benefit for outsourcing calculations such as covariance matrices or sharing a nice standard API (StatsBase.RegressionModel
).
Package development in Julia aims to provide lightweight packages. Rather than having one encompass all package these are usually a compilation of smaller packages. For example, DataFrames used to have I/O, tabular data representation, missing data support, categorical variables support, and statistical model transformations. Nowadays, there are packages for each of those functionalities separately and one can choose and select which ones one needs.