hi, I wanted to know if there’s an equivalent function to R’s jitter() that add noise to a value or vector of values.
Thanks you.
hi, I wanted to know if there’s an equivalent function to R’s jitter() that add noise to a value or vector of values.
Thanks you.
IIUC the noise is just flat? you can do something like this:
julia> function jitter!(a::Array, factor=1.0)
@assert eltype(a) <: AbstractFloat
a .+= rand(size(a)...) .* factor
end
jitter! (generic function with 2 methods)
julia> a = [1,2,3.0]
3-element Array{Float64,1}:
1.0
2.0
3.0
julia> jitter!(a)
3-element Array{Float64,1}:
1.1695643269414993
2.8603620667109464
3.398111259109001
Thanks, I also tried to replicate the function using the documentation of R as follow:
function jitter(x)
z = findmax(collect(skipmissing(x)))[1] - findmin(collect(skipmissing(x)))[1]
a = z/50
if a == 0
x = x .+ rand(length(x))
return x
else
x = x .+ rand(Uniform(-a, a), length(x))
return x
end
end
is working fine.
btw, your version give me the next error
**AssertionError: eltype(a) <: AbstractFloat**
when I try to pass a “Vector{Union{Missing, Int64}}” object, using collect(skipmissing()) to clean the missing values, that is a “Vector{Int64}” give me the same error.
that is because my function tries to do in-place operation, and a Integer
Vector obviously cannot jitter (unless you want to jitter in integer intervals).
I realize R’s jitter doesn’t something more complex (as your function implements them), so it’s good you got it working.
For your function, I’d suggest make use of extrema(skipmissing(x))
, and replace length
with size(x)...
so it also works on higher dimension array
You can actually go 1 step better and just do x = x .+rand.()
or x = x .+ rand.(Uniform(-a,a))
which will broadcast the rand
call to the right length automatically.
Thanks, I implement your recommendations.
now I want to know how is this implemented and how to customize the @.
fusion behavior for our functions
The really cool thing about how broadcasting works in Julia is that this isn’t specially implimented by rand
, and works for any scalar function.
FWIW, this is what I used and I like it because I don’t have to create a “jittered” vector to plot. Assuming I have two vectors I want to plot, x and y:
jitter(n::Real, factor=0.1) = n + (.5 - rand()) * factor
plot(x, jitter.(y))
function jitterR(x)
z =abs(-(extrema(skipmissing(x))...))
a = z/50
if a == 0
x = x .+ rand(size(x,1))
return x
else
x = x .+ rand(Uniform(-a, a),size(x,1))
return x
end
end
@. jitterR(a)
I would suggest using rand(typeof(n))
to avoid having numbers converted to Float64
, if you are originally using a different number type.
Now you are broadcasting both inside and outside the function, which isn’t very useful.
It would be nice to have a julia verison of quasirandom jitter (jitter plots so they correspond with a density estimate)
For your first example, you can simplify it even further and write x .+= rand.()
, or also @. x += rand()
.
Is there a package that implements this as well as lots of other useful stats functions? For example StatsFuns.jl doesn’t do it. But maybe it should? Or is there another package that does it?
Can you explain how that works? I don’t understand it from the docs you linked.
Here is a MWE
using Plots
using StatsPlots
using KernelDensity
y = randn(100*3)
x=repeat(1:3, inner=[100])
y[x.==1] .= randn(100).*0.25
barwidth = 0.8
width= 0.5 * barwidth
ngroups = length(unique(x))
k = Array{UnivariateKDE}(undef, ngroups)
for i in 1:ngroups
k[i] = kde(y[x .== i])
end
max_dens = map(x-> maximum(x.density), k)
x_jitter = x .+ width./max_dens[x] .* rand(length(x)) .* pdf.(k[x], y) .* rand([-1,1], length(x))
# jittered x
scatter(x_jitter, y)
# jittering lines up with a violin plot
violin!(x,y, alpha=0.1)
Thanks for the example. I agree that this is neat, and a mini-package implementing the calculation would be nice. Preferably in a way that does not depend on plotting libraries, so the result could be used widely.
Why should something like this go in a separate package? This seems like something that should go into StatsPlots maybe?
Oh, that’s interesting – when I saw “Quasirandom,” I thought this would be about introducing quasi-random noise, i.e. noise that’s generated using a low-discrepancy sequence such as this. (Golden sequences should be optimal in low dimensions.) I actually would like to see that – quasirandom noise should reduce overplotting more than random scatter, which will sometimes overlay points by chance (like in the plots you made).