Hello,

I’m running the program below on a 32 cpu/64 thread system without much of anything else running on it. If I use anything beyond 16 cores then the execution time in the second run is effectively flat. This is true both when I using a precompiled system image and when I don’t (though a bit more so when using a precompiled system image for reasons I don’t understand). What am I missing?

```
using Distributed
@everywhere using Optim, LinearAlgebra
@everywhere const R = 8000
@everywhere const d = 40
@everywhere function once(x::Int64)
for r = 1: ((x<0) ? 1 : R)
function Ω(θ::Vector{Float64})::Float64
dot(θ,θ) * 0.5
end
function dΩ!(g::Vector{Float64}, θ::Vector{Float64})
g[:] = θ
end
function ddΩ!(H::Matrix{Float64}, θ::Vector{Float64})
H[:,:] .= 0.0
for i = 1:d
H[i,i] = 1.0
end
end
Optim.optimize(Ω::Function, dΩ!::Function, ddΩ!::Function, ones(Float64,d), NewtonTrustRegion())
end
end
function doit()
@time pmap(once, -64:-1)
@time pmap(once, 1:192)
end
doit()
```