Contour plot for 2d kernel density

I need to visualize about 10^5 points on a 2D scatterplot, with four categories.

What I am currently doing is

  1. fitting a kernel density using KernelDensity.kde,
  2. making a rectangular grid,
  3. finding the contours with Countour.jl at the highest posterior density thresholds and plotting those.

This works ok-ish, but there are familiar artifacts of the marching squares algorithm in the result, even on a dense grid.

I am wondering if there is an algorithm to find the contour lines directly from the KDE (as it is a smooth function one should be able to trace). If someone has a reference I would consider implementing it in a plot-engine agnostic package.

3 Likes

Have you looked into D3? I would say that D3 has a pretty nice contour plot library and might give you some ideas:

Examples:

  1. Contour chart in d3.js
  2. d3-contour / D3 | Observable

What happens if you directly call eg Plots.jl’s contour function on your 2d kde function?

I am wondering if there is an algorithm to find the contour lines directly from the KDE (as it is a smooth function one should be able to trace).

If I understand correctly you want to avoid the rectangular grid and work directly from the interpolation function? I am still in the process of bringing some insight from Meshing.jl to Contour.jl, but I think we have the API for this.

Here are some example APIs in Meshing:
Direct function uniform sampling: Examples · Meshing

I am also experimenting with a fully adaptive approach just given bounds and tolerances here: Initial pass at adaptive sampling with Marching Cubes and Marching Tetrahedra by sjkelly · Pull Request #67 · JuliaGeometry/Meshing.jl · GitHub

These probably don’t help directly, but if this is the sort of API you are after it should be easy to bring to Contour, as the principles of both packages are very similar.

This works ok-ish, but there are familiar artifacts of the marching squares algorithm in the result, even on a dense grid.

The artifacts from marching squares are difficult to remove. A different option would be to use a Dual Contour approach, for example if you want to capture sharp corners (but this has its own set of ambiguities).

Also 10^5 points is not a lot for a smooth visualization, and I realize Contour hits a performance bottleneck very quickly with the current approach when generating vertex connectivity. I’ve taken a stab at it in this PR, but there are performance/memory tradeoffs. FWIW, we did a few improvements so that 0.5.2 should be quite a bit faster than v0.5.1, but not 100% where we want it. After a few discussions I think we will add a “FastEdge” method that produces edge pairs without connectivity that should provide better performance for visualization.

I think a uniform/adaptive sampling methods coupled with unconnected edge output should yield at least one or two orders of magnitude performance improvement which should give better visualization. I hope this made sense and input would be much appreciated!

2 Likes

Yes, precisely. Thanks for all the links, I will take some time to digest them.

I think it forms a rectangular grid, and the proceeds from there.

I am not sure what you mean, they use marching squares on a rectangular grid.

I just meant that I thought D3 was doing linear interpolation or some other form of averaging or smoothing. The algorithm may be different than the standard marching squares and may show less artifacts.

1 Like

@Tamas_Papp How about integrating the ODE that is “start here and follow the tangent vector to the level set”? Here’s a naive implementation:

using DifferentialEquations
using ForwardDiff
using LinearAlgebra

function tangent(f, u, p, t)
    n = ForwardDiff.gradient(f, u)
    return normalize([n[2], -n[1]])
end

tangent(f) = (u, p, t) -> tangent(f, u, p, t)

u0 = [1.0, 0.0]
tspan = (0.0, 10.0)

f( (x, y) ) = x^2 + y^2   # function to find contours of

prob = ODEProblem(tangent(f), u0, tspan)
sol = solve(prob)

using Plots
plot(sol, vars=(1, 2), ratio=1)

Basically you are inventing some kind of Hamiltonian system whose Hamiltonian is your function f.

EDIT: Added normalization to the tangent vector

3 Likes

I found this reference that apparently discusses this method:

GRANDINE, T. A. 2000. Applications of contouring. SIAM Rev. 42, 2 (Apr.), 297–316.

3 Likes

If that helps, the GMT’s contour module plots contours by linear interpolation of the Delaunay triangulation of the input x,y,z points
https://docs.generic-mapping-tools.org/dev/contour.html

1 Like

Thanks for the reference. It looks like a nice method, essentially parametrizing the contour manifold as a BVP with a function family (specifically, B-splines), then proceeds to solve the system using collocation.

An initial guess is needed (on each contour), but I imagine that these could be generated using a crude grid, triangulations, or similar. Maybe there is some nice property of KDEs that would help (eg boundedness, peaks are inside the convex hull of points, etc).

1 Like