# How to use `ordered` keyword equals to `true` with `StatsBase.sample()` function when sampling 2 or more dimensional array from a population?

I was using `sample()` function from `StatsBase` module for sampling with replacement from a population `a` and sampling probabilities weight vector `wv` as below:

``````a = [31, 12, 29, 9, 15]
wv = weights([0.1, 0.4, 0.2, 0.1, 0.2])
sample(a, wv, (3, 4), ordered=false)
``````

It provides a random matrix that looks like:

``````3×4 Array{Int64,2}:
31  12  29  29
15   9   9  31
31   9  15  15
``````

But when I write the code as:

`sample(a, wv, (3, 4), ordered=true)`

instead of providing any ordered (by row or column) random matrix, it throws the error:

`ERROR: UndefKeywordError: keyword argument dims not assigned`

How to use the keyword argument `ordered=true` here for sampling a 2 or more dimensional array from a population array?

1 Like

That error message could definitely be made better, but I think the reason why your example doesn’t work is because there isn’t a clear way to order the values of a matrix. For example, if your sampled values are `(1, 2, 3, 4)`, is the correct order for a 2x2 matrix this,

``````1 2
3 4
``````

or this,

``````1 3
2 4
``````

or even this,

``````4 3
2 1
``````

What you could do is sample a vector and then reshape it into a matrix:

``````a = [31, 12, 29, 9, 15]
wv = weights([0.1, 0.4, 0.2, 0.1, 0.2])
x = sample(a, wv, 12, ordered=true)
``````
``````julia> reshape(x, 3, 4)
3×4 Array{Int64,2}:
12  12  15  29
12  15  15  29
12  15  15  29
``````
2 Likes

I opened a Github issue to improve the error message:

2 Likes

Thanks for your so quick reply and sharing your thoughts.

Similar thing I also tried. This is also not a very straightforward way. If we write:

``````sort(sample(a, wv, (3, 4)), dims=2)   # For ordering column-wise.
3×4 Array{Int64,2}:
12  15  15  29
9   9  12  31
12  12  15  15

sort(sample(a, wv, (3, 4)), dims=1)   # For ordering row-wise.
3×4 Array{Int64,2}:
12   9  12  12
12  15  12  15
29  29  29  31
``````

Obviously, in these ways we are not getting the desired output directly from the `sample()` function.

A basic issue here is that there is a bug in `sample(a, ordered=true, replace=true)`: incorrect ordering for `sample(replace=true, ordered=true)` · Issue #675 · JuliaStats/StatsBase.jl · GitHub

You’ll see that if you sample a 1d output it gives a non-sequential order (i.e. not matching the ordering in `a`):

``````julia> sample(a, wv, 3, ordered=true)
3-element Array{Int64,1}:
9
12
29
``````

The natural way in Julia would be iteration order, i.e. column-major order.

Except for the abovementioned bug you could do this via `reshape`:

``````julia> reshape(sample(a, wv, 12, ordered=true), 3,4)
3×4 Array{Int64,2}:
9  12  15  15
12  12  15  15
12  12  15  29
``````

A workaround to get a correct ordering would be to sample indices:

``````julia> reshape(a[sample(1:length(a), wv, 12, ordered=true)], 3,4)
3×4 Array{Int64,2}:
31  12  29  15
31  29   9  15
12  29  15  15
``````
2 Likes

(This is not the same thing as ordered/sequential sampling. Which is fine, if that’s what you want.)

Ah, I see what you mean. I just assumed that the `ordered` keyword argument meant that the output samples should be sorted (e.g., lexicographically). However, from a quick google search I can’t find a reference for the type of sorting you refer to, where the output order respects the order in the original vector. The closest I’ve found is this:

which describes the case where you sample and then sort the samples (for the naive algorithm).

For example, Vitter (1984) defines “sequential” sampling this way, and the StatsBase docs define “ordered” as equivalent to “sequential”.

Moreover, `StatsBase.sample` uses this definition for the `replace=false` case, so it makes no sense to use a completely different definition of “ordered” for sampling with replacement.

Finally, sequential sampling as Vitter defined it is strictly more general. If you want sorted results from `sample(a, x, ordered=true)`, you can simply `sort!(a)` first in any ordering you want. (And the definition you are using fails completely if the elements of `a` do not have an `isless` ordering.)

4 Likes

Yes, that’s true. What I tried, is not the same thing as we see in sequential sampling.