# Fast fit of generalized pareto distribution

I’ve implemented a (slightly adapted) julia version of the Generalized Pareto estimation method described in this paper: DOI 10.1198/TECH.2009.08017

Am I missing any obvious performance improvements? Is it correct that this is not a good use case for StaticArrays, as the size of the small arrays is not known at compile time?

``````using Statistics

function fit_gpd(x::Vector{T}) where T <: Real
x = sort(x)
x_star = x[floor(Int, length(x) / 4.0 + 0.5)]

n = length(x)
m::Int = 30 + floor(sqrt(n))

θ = [1 / x[n] + (1 - sqrt(m / (j - 0.5))) / (3 * x_star) for j in 1:m]

lx(a, x) = let
a = -a
k = [mean(log1p.(b .* x)) for b in a]
@. log(a / k) - k - 1
end

l_θ = n * lx(θ, x)
w_θ = 1.0 ./ [sum(exp.(l_θ .- l_θ[j])) for j in 1:m]
θ_hat = sum(θ .* w_θ)

ξ = mean(log1p.(-θ_hat .* x))
σ = -ξ ./ θ_hat
ξ = ξ * n / (n + 10.0) + 10.0 * 0.5 / (n + 10.0)

return σ, ξ
end

s = rand(10000)
fit_gpd(s)
``````

2 Likes

Here is a cleaner version IMO, that allocates far less. However, the speed gains are not much. There are 4 main allocations happening in this version:

1. `x = sort(x)`
2. `θ = ...`
3. `l_θ = similar(θ)`
4. `w_θ = similar(l_θ)`

If you pre-allocate those outside the function and pass them to the function, you can make even more gains. You can also profile your function and see where it is spending most of the time and try to improve those areas.

``````using LinearAlgebra, Statistics

function fit_gpd2(x::Vector{T}) where T <: Real
x = sort(x)
x_star = x[round(Int, length(x) / 4)]

n = length(x)
m = 30 + floor(sqrt(n))

θ = Float64[1 / x[n] + (1 - sqrt(m / (j - 0.5))) / (3 * x_star) for j in 1:m]
l_θ = similar(θ)
for i in 1:length(l_θ)
b = θ[i]
k = mean(x -> log1p(-b * x), x)
l_θ[i] = n * (log(-b / k) - k - 1)
end

w_θ = similar(l_θ)
for j in 1:length(w_θ)
w_θ[j] = 1 / sum(x -> exp(x - l_θ[j]), l_θ)
end

θ_hat = dot(θ, w_θ)
ξ = mean(x -> log1p(-θ_hat * x), x)
σ = -ξ / θ_hat
ξ = ξ * n / (n + 10.0) + 10.0 * 0.5 / (n + 10.0)

return σ, ξ
end

using BenchmarkTools

s = rand(100000)
@btime fit_gpd(\$s)
#  343.120 ms (2821 allocations: 266.55 MiB)
#(0.9928484317360206, -0.991738176403473)

@btime fit_gpd2(\$s)
#  324.617 ms (351 allocations: 800.77 KiB)
#(0.9928484317360206, -0.991738176403473)
``````

If you find yourself doing this, you should change it to in-place sort instead:

``````sort!(x)
``````

Not necessarily, you may not want the side effect of mutating the input outside the function.

I figured that maybe he doesn’t want to mutate the arguments. In that case, I would:

``````fit_gpd2(x::AbstractVector{<:Real}}) = fit_gpd2!(copy(x))
function fit_gpd2!(x::AbstractVector{T}) where T <: Real
sort!(x)
x_star = ....
``````

Yeah, ok. But seems worth pointing out, anyway, since he’s asking about optimizations.

In general, I think you are creating a lot of vectors, and I wonder if you can re-write much of it into scalar form. It’s a bit hard to say exactly what to change, but do all of `θ`, `l_θ`, `w_θ`, `θ_hat` have to be vectors? Can’t you turn the whole thing into a scalar loop?

Also, though this is a small optimization, you are better off using integers more, rather than floats everywhere. For example, replace

``````floor(Int, length(x) / 4.0 + 0.5)
``````

with

``````div(length(x) + 2, 4)
``````

and

``````ξ = ξ * n / (n + 10.0) + 10.0 * 0.5 / (n + 10.0)
``````

with

``````ξ = (ξ * n + 5) / (n + 10)
``````

etc.

Pretty much all your literals should be integers instead of floats, it’s (marginally) faster, preserves types better, and looks cleaner.

1 Like

If you really want less allocation a little bit more performance, you should manually “unroll” the broadcast operations inside the loops. This is faster because you know you won’t access memory outside the size of the vectors:

``````function fit_gpd3(x::Vector{T}) where T <: Real
n      = length(x)
x      = sort(x)
x_star = x[div(length(x) + 2, 4)]
m      = 30 + floor(Int, sqrt(n))

θ   = Vector{Float64}(undef, m)
l_θ = Vector{Float64}(undef, m)
w_θ = Vector{Float64}(undef, m)

@inbounds for j in 1:m
θ[j] = 1 / x[n] + (1 - sqrt(m / (j - 0.5))) / (3x_star)
end

@inbounds for i in 1:m
b = θ[i]
k = 0.0

@simd for j = 1:n
k += log1p(-b * x[j])
end

k /= n
l_θ[i] = n * (log(-b / k) - k - 1)
end

@inbounds for j in 1:m
aux = 0.0

@simd for i = 1:m
aux += exp(l_θ[i] - l_θ[j])
end

w_θ[j] = 1 / aux
end

θ_hat = dot(θ, w_θ)

ξ = 0.0

@inbounds @simd for i = 1:n
ξ += log1p(-θ_hat * x[i])
end

ξ /= n
σ  = -ξ / θ_hat
ξ  = (ξ * n + 5 ) / (n + 10)

return σ, ξ
end
``````
``````julia> @btime fit_gpd2(\$s)
305.727 ms (351 allocations: 800.77 KiB)
(0.9913807459446422, -0.9902799795309473)

julia> @btime fit_gpd3(\$s)
296.859 ms (5 allocations: 789.95 KiB)
(0.9913807459446389, -0.9902799795309439)
``````

However, given the poor readability, I am not sure if it’s worth it.

Besides that, this is a naive sum algorithm, and you should have problems in some cases compared to the `mean` and `sum` approach.

Ah! I realize a better method that will not have the sum problem. Look that what you need is two auxiliary vectors, one of size `n` and other of size `m` to store the operations (`log1p` and `exp`) that happen inside the loops. In this case, you can gain performance and reduce the allocations while still using the `sum` and `mean` functions that take care about the sum algorithm:

``````function fit_gpd4(x::Vector{T}) where T <: Real
n      = length(x)
x      = sort(x)
x_star = x[div(length(x) + 2, 4)]
m      = 30 + floor(Int, sqrt(n))

θ    = Vector{Float64}(undef, m)
l_θ  = Vector{Float64}(undef, m)
w_θ  = Vector{Float64}(undef, m)
aux1 = Vector{Float64}(undef, n)
aux2 = Vector{Float64}(undef, m)

@inbounds for j in 1:m
θ[j] = 1 / x[n] + (1 - sqrt(m / (j - 0.5))) / (3x_star)
end

@inbounds for i in 1:m
b = θ[i]
@. aux1 = log1p(-b * x)
k = mean(aux1)
l_θ[i] = n * (log(-b / k) - k - 1)
end

@inbounds for j in 1:m
@. aux2 = exp.(l_θ - l_θ[j])
w_θ[j] = 1 / sum(aux2)
end

θ_hat = dot(θ, w_θ)
@. aux1 = log1p(-θ_hat * x)

ξ = mean(aux1)
σ = -ξ / θ_hat
ξ = (ξ * n + 5 ) / (n + 10)

return σ, ξ
end
``````
``````julia> @btime fit_gpd4(\$s)
296.737 ms (8 allocations: 1.54 MiB)
(0.9913807459446421, -0.9902799795309472)
``````

Thank you all for your replies, this is really useful.
What I take away is:

1. Use anonymous functions for `sum` and `mean` (or use manual loops instead of broadcast for a slight speed increase) instead of array comprehensions.
2. it’s okay to use integer literals (I misunderstood the Performance Tips section there).