# Any way to efficiently run Poisson regression thousands of times?

I need to run Poisson regression around 3000 times with the same set of Y and X but different weights. I’m currently using GLM.jl with a for-loop. The whole thing takes about 58 seconds. I was wondering if there are more efficient ways to run this. Thanks!

An MWE:

using GLM
using Distributions

beta_t = [0.2, 0.1, 0.5, 1.1]
x = rand(100000, 4)
lambdas = exp.(x*beta_t)
y = rand.(Poisson.(lambdas))

w = rand(size(y,1), 3000)
results = zeros(3000, 4)
for i in axes(w, 2)
results[i,:] = collect(coef(glm(x, y, Poisson(), wts=w[:,i])))
end


Thanks again!

The most obvious thing would be to use threading, i.e. add a Threads.@threads before the for-loop and launch with julia --threads=auto. If you’re fine with it, you can also try to use Float32 for all the data, or even try putting a @fastmath in front of the for loop – however, double check your correctness in that case.

Hi @RomeoV, thanks for your quick reply! I should have mentioned that I tried multi-threading, but the speed gain was not significant enough. Ideally, I’d like it to be under ~2 secs if that’s possible, as this step is used as an input for the later steps.

For me it threading speeds up about 4x:

julia> @time Threads.@threads for i in axes(w, 2)
results[i,:] = collect(coef(glm(x, y, Poisson(), wts=w[:,i])))
end
32.137273 seconds (199.79 k allocations: 35.775 GiB, 1.09% gc time, 0.75% compilation time)

julia> @time for i in axes(w, 2)
results[i,:] = collect(coef(glm(x, y, Poisson(), wts=w[:,i])))
end
111.505098 seconds (179.93 k allocations: 35.773 GiB, 0.20% gc time)


Are you running with julia --threads=auto ? You can check with Threads.nthreads() how many threads you have available (I have 8 cores=16 threads).

Another thing you might try is to use StaticArrays.jl (since you seem to know your sizes), but it depends whether GLM.jl uses the static sizes properlly or not, which I’m not sure about.

Doesn’t save so much time, but having an initial guess helps:

function testit()
beta_t = [0.2, 0.1, 0.5, 1.1]
x = rand(100000, 4)
lambdas = exp.(x*beta_t)
y = rand.(Poisson.(lambdas))
guess = collect(coef(glm(x, y, Poisson())))
w = rand(size(y,1), 3000)
results = zeros(3000, 4)
for i in axes(w, 2)
results[i,:] = collect(coef(glm(x, y, Poisson(), wts=w[:,i], start=guess)))
end
end


@xxxxx You do not say which processor you are using. It would be nice to know:

processor type
how many cores
hyperthreading - on or off?
If Intel is Turboboost enabled?

If you are in a financial institution you should probably have access to a multicore system with a high clock frequency.

I am not a user of GLM.jl but there is a keyword argument start in the glm function. That argument can be used to set the initial values of the regression coefficients. You could try setting them based on the previously-computed coefficients.

# currently
results[i,:] = collect(coef(glm(x, y, Poisson(), wts=w[:,i])))
# would become
results[i,:] = collect(coef(glm(x, y, Poisson(), wts=w[:,i],
start=results[i - 1, :])))
# or maybe?
results[i,:] = collect(coef(glm(x, y, Poisson(), wts=w[:,i],
start=mean(results[1:i - 1, :], dims=1))))


I am not sure if these initial coefficients are guaranteed to be a closer to the optimum after changing the weights, but I would think it often should be the case. It is worth trying.

Edit: It might also help to add @views in the loop:

results[i,:] = @views collect(coef(glm(x, y, Poisson(),
wts=w[:,i], start=results[i - 1, :])))


Hi @RomeoV , thanks for your help! I have implemented those, and it worked great.

Thanks @Dan , the initial guess also helps!

Thanks @barucden . Recursively using the results from the last step was also helpful!

BTW the code in the post works nicely and is a good MWE, but the random nature of the coefficients and weights might overlook more optimization opportunities.

If the actual problem has structure, if you describe it, maybe good suggestions will come up.

It’s not clear to me what the end goal is. If the problem has already been solved on 58 seconds, what’s left to be done? The reason that I ask is that the effort one would but into speeding this up depends on what is the actual final goal.

Going down memory lane, I once wrote a paper called “I Ran Four Million Probits Last Night: HPC Clustering with ParallelKnoppix”. Your problem reminded me of it.

Agree, the best way to make something faster is to make the thing more efficient, perhaps running 3000 repetitions could be modified to something else that accomplished the overlying goal better.

Thanks @Dan @mcreel @dlakelan ! The end goal here is to use the estimates from the Poisson regression with different weights to make predictions on the mean arrival rates, i.e., X\hat{\beta}. @mcreel I will also check out your paper, and thanks for the reference!

It would be useful to also take this post as an opportunity to link to the related new question: Nesting the GLM.jl in the objective function when formulating the ReverseDiff gradient generates StackOverflowError

That question is a better description for the end-goal as @mcreel suggested. Especially, it also has a clue (yes, it is a clue, because readers need to undo the transformations done to protect the ‘proprietary’ application), to the weights generation.

The weights in that question are all scalar multiples of each other, which essentially means predict_data in that question is independent of the para[3] scaling factor and only depends on data[:,:w]. It also means inference on para[3] is impossible (it’s statistically unidentifiable).

Hi @Dan , thanks, but these two are uncorrelated questions. Altough both questions concern about Poisson regression and GLM.jl, they have different end applications.

In that particular question’s MWE, I understand that para[3] is unidentifiable, but even if it’s unidentifiable, I don’t think that is the cause for the StackOverFlow error that I encountered. In that question, I was particularly curious if it would be possible to combine the GLM with the ReverseDiff.

In this question, the end usage would be what I described, purely as making the predictions based on the estimated coefficients from different weights. Thanks again.

No need to read that paper! It just showed how MPI could be used to speed up Monte Carlo, an idea that was not well-known to economists at the time. Actually, MPI would be one way to speed up your problem, using MPI.jl. The documentation for that would be a better read than my paper. But, perhaps it would be easier to use Julia’s built-in Distributed package.

1 Like

I guess what’s not clear is why 3000 different sets of weights. Obviously if you only need to do it for 1 set of weights then it can be done FAR faster than 58 seconds or whatever. And the amount of time grows linearly with the number of weights at least to first approximation… so, the 3000 samples must play some role, and the question of interest for speeding this up is what is that role and can that role be played by some other methodology, for example is there 30 or 300 different chosen weights which can give effectively the same answer as the full 3000 or something like that.

So, if you explain the role that resampling with different weights plays people may be able to point you at algorithms that use far fewer fits to get you equally good or even better answers (or possibly there is no such answer) but without more knowledge there we can only help you with “how to make the software do 3000 of them in less time”

1 Like

I will try those as well. Thanks!