Sampling Groups and individual within the group

a[:,4] represent group.

a= [1 2 3 1;5 6 7 1;1 2 3 1;1 2 4 1;1 2 3 2;1 2 3 2;1 2 4 2;1 2 4 2;1 3 4 3;1 3 4 3;1 3 4 3;1 2 3 3 ]
12ร—4 Matrix{Int64}:
 1  2  3  1
 5  6  7  1
 1  2  3  1
 1  2  4  1
 1  2  3  2
 1  2  3  2
 1  2  4  2
 1  2  4  2
 1  3  4  3
 1  3  4  3
 1  3  4  3
 1  2  3  3

I have grouped data so far.

I want to randomly select two individuals from each group and randomly select the two groups.Can anyone help please?

a_dat=DataFrame(a,:auto)
gb=groupby(a_dat,:4)
GroupedDataFrame with 3 groups based on key: x4
First Group (4 rows): x4 = 1
x1	x2	x3	x4
Int64	Int64	Int64	Int64
1	1	2	3	1
2	5	6	7	1
3	1	2	3	1
4	1	2	4	1
โ‹ฎ
Last Group (4 rows): x4 = 3
x1	x2	x3	x4
Int64	Int64	Int64	Int64
1	1	3	4	3
2	1	3	4	3
3	1	3	4	3
4	1	2	3	3

A simple array-based solution:

using StatsBase
using SplitApplyCombine
using DataPipes

@p begin
	splitdimsview(a, 1)
	group(_[4])
	collect()
	sample(โ†‘, 2; replace=false)  # choose two groups
	map(sample(_, 2; replace=false))  # choose two members from each
end

returns two groups, each with two members:

2-element Vector{...}:
[[1, 2, 3, 1], [5, 6, 7, 1]]
[[1, 2, 3, 2], [1, 2, 3, 2]]
1 Like

Hi, Out of three groups; 1, 2, 3 ; I want two randomly selected groups. And within these two groups I want two randomly selected elements from three elements. The final output must look similar to below

1 3 1
5 7 1
1 3 2
1 2 2

One way that outputs the result as a dataframe, inspired by this post.

(continuation of your code:)

using StatsBase

ix2 = sample(1:length(gb), 2, replace=false)
db = vcat([gbi[sample(axes(gbi, 1), 2; replace=false), 1:3] for gbi in gb[ix2]]...)

 Row โ”‚ x1     x2     x3    
     โ”‚ Int64  Int64  Int64 
โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
   1 โ”‚     1      2      4
   2 โ”‚     5      6      7
   3 โ”‚     1      3      4
   4 โ”‚     1      2      3

All resulting members in a single matrix:

@p begin
	splitdims(a, 1)
	group(_[4])
	collect()
	sample(โ†‘, 2; replace=false)
	map(sample(_, 2; replace=false))
	mapmany(_, __[1:3])
	combinedims()
	permutedims()
end

However, note that itโ€™s often easier to work with data when each โ€œindividualโ€ (e.g., group member in your case) is separate. Eg, vector-of-vectors vs matrix.