Jitter R function equivalent in Julia?

Oh interesting; that is probably exactly what the R function does. The jittering matching up with a kernel density shape is probably just byproduct. (edit: its not a byproduct, they explicity distribute the points between kernel density, but instead of pseudorandom they use quasirandom distribution to avoid overplotting).

The R documentation says it:
Arranges data points using quasirandom noise (van der Corput sequence)

1 Like

There are many plotting libraries (Plots, Makie, Gadfly, Gaston, etc) and people may want to use the calculation with any of them.

4 Likes

using actually quasirandom (van der Corput Sequences) distribution of numbers rather than pseudorandom:

using Plots
using KernelDensity
using StatsBase

vandercorput(num::Integer, base::Integer) = sum(d * Float64(base) ^ -ex for (ex, d) in enumerate(digits(num, base=base)))

N=100
y = randn(N*3)
x=repeat(1:3, inner=[N])
y[x.==1] .= randn(N).*0.25

barwidth = 0.75 
width= 0.4 * barwidth

ngroups = length(unique(x))

k = Array{UnivariateKDE}(undef, ngroups)
max_dens = zeros(ngroups)
q = zeros(length(x))
dens = zeros(length(y))
for i in 1:ngroups
	k[i] = kde(y[x .== i])
	dens[x .== i] .= pdf(k[i], y[ x .== i])
	max_dens[i] = maximum(dens[x .== i])
	
	q[x .== i] .= vandercorput.(1:N, 2)[competerank(y[x .== i])]
end

x_jitter = x .+ width./max_dens[x] .*  (q .- 0.5) .* 2 .* dens

# jittered x
scatter(x_jitter, y)

# jittering lines up with a violin plot
violin!(x,y, alpha=0.5)

image

7 Likes

It seems like StatsFuns.jl would be a good choice, at least by name, but most of what that does is provide special functions for evaluating distributions. Is there a better like “StatsUtilities.jl” or something?

1 Like

Makes some sense but it seems weird to separate out such a basic feature. It feels like making a new package for histograms because someone might want to use a histogram in lots of different packages.

1 Like

But that’s true, histograms are in StatsBase and that functionality is used in plotting packages (at least Makie uses it from StatsBase, I’m not so familiar with other plotting packages).

Huh. Guess StatsBase would work, then.