@dlakelan That makes sense, and thanks for the clarification! Unfortunately, I would say I need those 3000 predictions for this particular question.
Your best bet is probably to run it on 40 threads then, maybe split across 5-10 machines. Most desktop computers have 4-8 cores these days. I’d say look at Distributed.jl and maybe Transducers.jl you might be able to get really simple parallelism by generating the 3000 weight samples and then distributing them between the say 10 machines and then within each machine running a transducer based loop or Floops.jl based loop on each machine.
Do not use GLM.jl here… Your best chance is to write your own newton-raphson (this is what GLM solvers do) on the loglikelyhood of your model.
For a Poisson GLM, this is not that complicated to do, the model basically fits \beta to minimize the log-likelyhood:
\ell(\beta) = \sum_{i=1}^n w_i\left(y_i \beta'X_i - e^{\beta'X_i}\right).
A lot of pieces in the NR solving will NOT depend on weights and can be precomputed once and only once
Compute first and second derivatives manually and precompute everything that depend on y,X, so that your functions loss(w), gradient(w)
and even hessian(w)
are very quick. Basically each of them should simply be a dot product with something that depend on beta, a function depending on beta that can be precomputed as much as possible. This will be easier to do if you consider W = diagonal(w) the weighting matrix to vectorise a bit the expressions.
You’ll have to do a bit of work, yes, but you’ll get much better runtimes than running the full GLM each time :).
Hi @lrnv , this is very helpful! Thanks for the suggestion!