What is the difference between
rand(array, n) and
What is the difference between
sample is from where?
sampleis from where?
StatsBase.sample(array, n) and
rand is from
I don’t know how consistent people are about this, but as I think of it,
rand is usually lower-level, while
sample is often built in terms of (potentially many) calls to
sample is also often used for MCMC.
@Tamas_Papp I’ve heard you make similar arguments, anything to add?
If you use
@edit to inspect the code of calls to both functions you will probably reach the same understanding given by @cscherrer. It seems like
sample calls a
sample! that may call many different options, ranging from
direct_sample (which then calls
rand) or things like:
seqsample_c! (more complex ways of sampling, all probably built over
StatsBase.sample is for picking random items from a collection, with potentially weights and with/without replacement. I don’t think it was intended to be a generic function, it is a utility for a special case.
Random.rand is a generic IID sampler.
@jzr, all of this is documented, which part of the documentation did you find unclear?
I don’t think it was intended to be a generic function, it is a utility for a special case.
My confusion arose because
sample is indeed used as a generic function by Turing.jl and other packages. If I understand correctly, there are multiple interpretations of the distinction between these functions. (Namely, “rand is generic iid; sample isn’t generic”, and “rand is simple low-level drawing; sample is higher-level drawing that uses rand”.)
Would it have been appropriate usage for the MCMC packages to use
rand for their user-facing sampling interface (instead of
Personally I think that using
StatsBase.sample for MCMC is bad style (“punning”).
I fully agree with this view. The function
sample refers to an operation performed on a dataset (or population) and can take weights and replacement options. The function
rand is for random variables with a given distribution (e.g.
Soss currently uses a different function name for each algorithm, but I like the idea of having the abstracted away a bit, so making the algorithm an argument is appealing to me.
I think that
Turing.sample is OK. It just should not coincide with
StatsBase.sample, which is a rather pointless pun.
@cscherrer I think I would end up choosing
rand for this posterior distribution so that it is clear that it is not a sample from a population a la
StatsBase.sample. I am trying to follow this pattern in my packages whenever I can so that users know when they can pass weights for example.
And I would maybe describe this functionality as:
A function that draws from a posterior distribution given a model, observed data and algorithm…
But that is all a matter of taste I guess as we all understand the meaning of these concepts. It would be nice to have a consistent terminology across multiple ecosystems though.
I’m pretty sure Turing overrides and re-exports
But these samples are not usually iid, which breaks a common assumption of
rand. To me this is a more critical distinction than requiring that
sample is only for drawing a sample from a population, which isn’t at all clear to me. Are you requiring here that a “population” be finite, and unweighted?
If someone uses
rand thinking the results will be iid, violating that could cause big problems. As for
sample, I really don’t see such a risk in extending it.
Oh I had IID samples in mind when you wrote:
I am assuming they are not IID because of an algorithmic (MCMC) detail?
Yes, and I think that is what most people have in mind when they hear the term population in statistics, but I may be wrong.
I think the IID property is relevant, but isn’t the critical aspect to differentiate between
Yes, that’s right. Sometimes we can get IID samples, but that’s not usually the case.
Maybe it depends on which branch of statistics. The expression “sample from the posterior” comes up a lot in Bayesian stats, and we usually don’t assume the result will be IID.
StatsBase.sample is a utility function for a very specific stats/probability exercise, ie pulling objects from a bag/urn with/without replacement. This meaning predates the Bayesian/MCMC usage by centuries, and is really not the same thing.