OnlineStats: StatLearn vs LinReg vs other options for linear regression on large datasets

  • LinReg is exact regression.

  • Everything StatLearn does is approximate.

  • LinRegBuilder is a more general version of LinReg.

  • StatLearn has many algorithm options: SGD, ADAGRAD, ADAM, ADAMAX, RMSPROP, MSPI, …

  • StatLearn will be faster, LinReg will be more correct. For linear regression I would use LinReg. For logistic regression your only option (in OnlineStats) is StatLearn(p, LogitMarginLoss()).

  • You can repeatedly fit! an OnlineStats object on new batches of data, but OnlineStats is agnostic on how you get your data into Julia. It doesn’t have helpers for streaming a CSV file.

  • If you want to iterate through the rows of a CSV file one by one without loading it into memory, see CSV.File.

9 Likes