Jitter R function equivalent in Julia?

hi, I wanted to know if there’s an equivalent function to R’s jitter() that add noise to a value or vector of values.

Thanks you.

IIUC the noise is just flat? you can do something like this:

julia> function jitter!(a::Array, factor=1.0)
           @assert eltype(a) <: AbstractFloat
           a .+= rand(size(a)...) .* factor
       end
jitter! (generic function with 2 methods)

julia> a = [1,2,3.0]
3-element Array{Float64,1}:
 1.0
 2.0
 3.0

julia> jitter!(a)
3-element Array{Float64,1}:
 1.1695643269414993
 2.8603620667109464
 3.398111259109001
1 Like

Thanks, I also tried to replicate the function using the documentation of R as follow:

function jitter(x)
    z = findmax(collect(skipmissing(x)))[1] - findmin(collect(skipmissing(x)))[1]
    a = z/50
    if a == 0
        x = x .+ rand(length(x))
        return x
    else
        x = x .+ rand(Uniform(-a, a), length(x))
        return x
    end
end

is working fine.

btw, your version give me the next error

**AssertionError: eltype(a) <: AbstractFloat**

when I try to pass a “Vector{Union{Missing, Int64}}” object, using collect(skipmissing()) to clean the missing values, that is a “Vector{Int64}” give me the same error.

that is because my function tries to do in-place operation, and a Integer Vector obviously cannot jitter (unless you want to jitter in integer intervals).

I realize R’s jitter doesn’t something more complex (as your function implements them), so it’s good you got it working.

For your function, I’d suggest make use of extrema(skipmissing(x)), and replace length with size(x)... so it also works on higher dimension array

2 Likes

You can actually go 1 step better and just do x = x .+rand.() or x = x .+ rand.(Uniform(-a,a)) which will broadcast the rand call to the right length automatically.

7 Likes

Thanks, I implement your recommendations.

now I want to know how is this implemented and how to customize the @. fusion behavior for our functions

The really cool thing about how broadcasting works in Julia is that this isn’t specially implimented by rand, and works for any scalar function.

4 Likes

FWIW, this is what I used and I like it because I don’t have to create a “jittered” vector to plot. Assuming I have two vectors I want to plot, x and y:

jitter(n::Real, factor=0.1) = n + (.5 - rand()) * factor

plot(x, jitter.(y))
function jitterR(x)

    z =abs(-(extrema(skipmissing(x))...))

    a = z/50

    if a == 0

        x = x .+ rand(size(x,1))

        return x

    else

        x = x .+ rand(Uniform(-a, a),size(x,1))

        return x

    end

end

@. jitterR(a)
1 Like

I would suggest using rand(typeof(n)) to avoid having numbers converted to Float64, if you are originally using a different number type.

1 Like

Now you are broadcasting both inside and outside the function, which isn’t very useful.

1 Like

It would be nice to have a julia verison of quasirandom jitter (jitter plots so they correspond with a density estimate)

1 Like

For your first example, you can simplify it even further and write x .+= rand.(), or also @. x += rand().

Is there a package that implements this as well as lots of other useful stats functions? For example StatsFuns.jl doesn’t do it. But maybe it should? Or is there another package that does it?

Can you explain how that works? I don’t understand it from the docs you linked.

1 Like

Here is a MWE

using Plots
using StatsPlots
using KernelDensity

y = randn(100*3)
x=repeat(1:3, inner=[100])
y[x.==1] .= randn(100).*0.25

barwidth = 0.8 
width= 0.5 * barwidth

ngroups = length(unique(x))

k = Array{UnivariateKDE}(undef, ngroups)
for i in 1:ngroups
	k[i] = kde(y[x .== i])
end
max_dens = map(x-> maximum(x.density), k)

x_jitter = x .+ width./max_dens[x] .* rand(length(x)) .* pdf.(k[x], y) .* rand([-1,1], length(x))

# jittered x
scatter(x_jitter, y)

# jittering lines up with a violin plot
violin!(x,y, alpha=0.1)

Screenshot 2021-08-03 125703

5 Likes

Thanks for the example. I agree that this is neat, and a mini-package implementing the calculation would be nice. Preferably in a way that does not depend on plotting libraries, so the result could be used widely.

Why should something like this go in a separate package? This seems like something that should go into StatsPlots maybe?

Oh, that’s interesting – when I saw “Quasirandom,” I thought this would be about introducing quasi-random noise, i.e. noise that’s generated using a low-discrepancy sequence such as this. (Golden sequences should be optimal in low dimensions.) I actually would like to see that – quasirandom noise should reduce overplotting more than random scatter, which will sometimes overlay points by chance (like in the plots you made).

1 Like