Truncation before unique unfortunately induces a discontinuity if some energy levels are close to the truncation boundary. An easy variant is the following (if your energy is one-dimensional and you know the discretization parameter apriori):
function discretize(energy_levels, epsilon)
res = Vector{Vector{eltype(energy_levels)}}()
current = Vector{eltype(energy_levels)}()
bound = energy_levels[1][1]+epsilon
mingap = Inf
for (energy, idx) in energy_levels
if energy <= bound
push!(current, (energy, idx))
else
mingap = min(mingap, energy - current[end][1])
push!(res, current)
current = [(energy, idx)]
bound = energy + epsilon
end
end
push!(res, current)
return mingap, res
end
Then:
julia> N = 10_000;
julia> energy_levels = sort!([(randn() + 5*iseven(i), i) for i=1:N]);
julia> discretize(energy_levels, 1.0)
(4.000029130021192, Array{Tuple{Float64,Int64},1}[[(1.00344e-5, 1649), (0.00115351, 1023), (0.00124319, 4715), (0.001514, 2551), (0.00183936, 499), (0.00186862, 7835), (0.00190698, 4597), (0.00227954, 4959), (0.00234361, 6277), (0.00271333, 4933) … (0.997599, 9123), (0.997636, 8371), (0.997659, 7433), (0.997705, 1337), (0.998313, 769), (0.998855, 7799), (0.99897, 7941), (0.999027, 2609), (0.999211, 5695), (0.999993, 4315)]])
julia> energy_levels = sort!([(randn(), i) for i=1:N]);
julia> discretize(energy_levels, 1.0)
(0.00019930580942706388, Array{Tuple{Float64,Int64},1}[[(-3.65232, 7208), (-3.49434, 2538), (-3.27886, 9182), (-3.24522, 2502), (-3...
You see that the mingap
is large in the first case (because your distribution really is just two values with some small error) and small in the second case (assumptions for discretize
to make sense are violated). Use mingap > 2*epsilon
as a reality-check.
If you don’t know the error scale a priori, then you need to do some kind of hierarchical clustering in order to discover the error scale.
If your numerical errors are on the same scale as the spectral gaps then you are SOL and need better computations. If you have continuous spectrum, then you are SOL, period. If eigenvalues accumulate somewhere (typical situation for compact operators like inverse laplace) and you know the accumulation points (0 for compact operators) then you should do some special-casing of the accumulation points.
edit: bugfix