GLM is slow on large datasets. Using OnlineStats for regressions? MixedModels?

You can pass the model matrix directly to fit instead of using a Formula and DataFrame. That will save you the costly computation of the model matrix. I.e.

julia> @time ft = fit(LinearModel, @formula(y ~ x1+x2+x3+x1x2), data);
  1.254742 seconds (540 allocations: 1.745 GiB, 37.68% gc time)

julia> @time fit(LinearModel, ft.mm.m, data.y);
  0.551531 seconds (26 allocations: 823.976 MiB, 41.03% gc time)

We are already using the Cholesky.

1 Like