I want to create a series of random numbers from a custom distribution. Suppose I have a data set named iris:
julia> using RDatasets
julia> iris = dataset("datasets", "iris");
Now I want to generate 100 random numbers sampled from the distribution of iris.SepalLength with a presumption that no one knows about the distribution of this vector. So in this case I should literally create a new distribution first; For this, there is an option by using Distributions.jlhere. But I couldn’t follow the steps correctly. The doc says:
To implement a univariate sampler, one can define a subtype (say Spl) of Sampleable{Univariate,S} (where S can be Discrete or Continuous), and provide a rand method, as
function rand(rng::AbstractRNG, s::Spl)
# ... generate a single sample from s
end
But I don’t know how to provide the s. I tried:
julia> using Distributions
julia> mysamp = Sampleable(iris.SepalLength)
ERROR: MethodError: no method matching Sampleable(::Vector{Float64})
Stacktrace:
[1] top-level scope
@ REPL[20]:1
Any help would be appreciated. I want to create a vector of random samples from a custom distribution. The custom distribution should be created from specific data.
Finally you can sample from the fitted distribution:
using Random, Distributions
julia> r = randn(10)./5 .+ 2; # starting data
julia> d = fit(Normal, r)
Normal{Float64}(μ=2.091694546378728, σ=0.09721707142006106)
julia> rand(d, 5)
5-element Vector{Float64}:
2.077566588816171
1.9793350650939816
2.319975404778217
2.1013889428372607
1.8952925440083732
If you have no idea which distribution to start with, then I guess you should try various, but the Normal distribution should be a good starting point for your case.
Sampleable is an abstract type. You should make you own realization.
Try:
using Distributions, Random
v = [1,2,5,4,3,6,7,5,4,4,2,3,4,5]
struct Spl <: Sampleable{Univariate , Discrete }
vec
end
samp = Spl(v)
function Base.rand(spl::Spl)
v[rand(1:length(v))]
end
rand(samp)
Sorry but this picks some values from the v at last. But I didn’t mean it. I want to generate random numbers that follow the v’s distribution!
As I said:
I would avoid downloading mega bytes of packages for simple tasks. Use standard library functions as much as possible.
For example:
function rand_empirical(x, m)
n = length(x)
# kde bandwidth h ≈ 1.06*σ*n^(-1/5)
h = 1.06*sqrt(n*sum(x .* x) - sum(x)^2)/n/n^(0.2)
# generate from categorical and add variates from kernel density
# to get variates from empirical distribution, for example,
# for gaussian kernel with width h
r1 = x[rand(1:n, m)]
r2 = h*randn(m)
r = r1 .+ r2
return r
end
Now I want to generate 100 random numbers sampled from the distribution of iris.SepalLength with a presumption that no one knows about the distribution of this vector.
If no one knows about the distribution of this vector - better way to bootstrap 100 samples from this vector. Yes, v - not a distribution, but if you take random element from you will follow this unknown distribution.
Than you can make some assumption, for example - distribution is continuous, find appropriate pdf/cdf functions. Or you can thought that this distribution is discrete, and fit Categorical distribution.
you can sample from the empirical distribution of the vector; this is achieved by picking at random some elements of the vector, with replacement
you can fit a continuous distribution to the vector; for example a parametric distribution such as the normal distribution, and then you sample from this distribution, or a non-parametric distribution, but I don’t know whether there exists a Julia package to do that