Any way to efficiently run Poisson regression thousands of times?

xxxxx · December 3, 2023, 7:16pm

@dlakelan That makes sense, and thanks for the clarification! Unfortunately, I would say I need those 3000 predictions for this particular question.

dlakelan · December 3, 2023, 8:06pm

Your best bet is probably to run it on 40 threads then, maybe split across 5-10 machines. Most desktop computers have 4-8 cores these days. I’d say look at Distributed.jl and maybe Transducers.jl you might be able to get really simple parallelism by generating the 3000 weight samples and then distributing them between the say 10 machines and then within each machine running a transducer based loop or Floops.jl based loop on each machine.

lrnv · December 3, 2023, 8:49pm

Do not use GLM.jl here… Your best chance is to write your own newton-raphson (this is what GLM solvers do) on the loglikelyhood of your model.

For a Poisson GLM, this is not that complicated to do, the model basically fits \beta to minimize the log-likelyhood:

\ell(\beta) = \sum_{i=1}^n w_i\left(y_i \beta'X_i - e^{\beta'X_i}\right).

A lot of pieces in the NR solving will NOT depend on weights and can be precomputed once and only once

Compute first and second derivatives manually and precompute everything that depend on y,X, so that your functions loss(w), gradient(w) and even hessian(w) are very quick. Basically each of them should simply be a dot product with something that depend on beta, a function depending on beta that can be precomputed as much as possible. This will be easier to do if you consider W = diagonal(w) the weighting matrix to vectorise a bit the expressions.

You’ll have to do a bit of work, yes, but you’ll get much better runtimes than running the full GLM each time :).

xxxxx · December 4, 2023, 3:44pm

Hi @lrnv , this is very helpful! Thanks for the suggestion!

Topic		Replies	Views
Huge CPU load when using GLM General Usage question , glm	14	558	October 25, 2022
GLM is slow on large datasets. Using OnlineStats for regressions? MixedModels? Performance glm	25	5079	November 26, 2018
Accelerating linear methods Machine Learning question	11	684	June 3, 2023
Fastest way for GLMM Performance question , package	6	236	October 7, 2024
Simulation of regression coefficients New to Julia first-steps , dataframes	7	565	June 29, 2021

Any way to efficiently run Poisson regression thousands of times?

Related topics