Hi,
I am working on a combinatorial problem called Block Design:
Block design - Wikipedia ( specifically Pairwise balanced uniform designs (2-designs or BIBDs) )
I try to group N=25 samples (e.g. integers 1:25) into “blocks” (aka pools) of 5 samples each, where each pair of samples only occurs in one block together. The blocks should be chosen so that 5 of these blocks form a partition of the set of all 25 samples. A partition of the set of all samples into 5 blocks as described above call a “layer”, I try to find as many unique layers as possible. (in the above example there are 6 layers)
I implemented a backtracking algorithm to find solutions to the problem, which currently finds fesible solutions for N=9, N=16,& N=25, but takes prohibitively long for N=36.
Timing for N=25:
@time BlockDesign.run()
1.606897 seconds (2.06 M allocations: 307.621 MiB, 0.59% gc time)
I think the backtracking algorithm itself can be improved, but looking at the large number of allocations and the Profiler output pointing to the feasible(PD, M, M_) function, this could be address first.
(Is there a way to attach the complete profiler output as .txt?):
Function call highlighted by profiler:
fs = myfind(feasible(PD, M, M_))
In short: how can we make the function feasible(PD, M, M_) faster, either with the current implementation using the Int64 matrices M & M_ or in another way.
The functions + helper functions needed to run above function call:
struct PoolDesign{T<:Integer}
N::T
m::T
q::T
lmax::T
end
PD = PoolDesign(
25, # number samples
5, # samples per pool
5, # pools per layer
6, # layers in pooling design
)
function myfind(c)
a = similar(c, Int)
count = 1
for i in eachindex(c)
a[count] = i
count += (c[i] != zero(eltype(c)))
end
return resize!(a, count-1)
end
function pool_idx(M, PD)
for (i, row) in enumerate(eachrow(M))
if sum(row) < PD.m
return i
end
end
size(M,1)
end
layer_idx(M, PD) = ((pool_idx(M, PD) - 1)÷PD.q) + 1
layer(PD, l) = @view M[ ((l-1)*PD.q+1):(l * PD.q), :]
current_layer(M, PD) = layer(PD, layer_idx(M, PD))
function feasible(PD, M, M_)
feas = collect(1:PD.N)
cl = current_layer(M, PD)
# if current layer is empty, all samples are feasible
if !iszero(cl)
for i in 1:PD.N
for q in 1:PD.q
if cl[q, i] == 1
feas[i] = 0
end
end
end
# if current pool is empty,
# all remaining unplaced samples in layer are feasible
if !iszero(current_pool(M, PD))
for k in myfind(current_pool(M, PD))
for l in 1:PD.N
feas[l] = feas[l] * M_[k, l]
end
feas[k] = 0
end
end
end
return feas
end
dims = (PD.lmax * PD.m, PD.N)
M = zeros(Int64, dims)
M_ = ones(Int64, PD.N, PD.N)
fs = myfind(feasible(PD, M, M_))
The matrix M stores the Block design, assigning the samples (cols) to the different blocks (rows), where M(j,i) = 1 means that sample i is block j.
The matrix M_ is symmetric and stores the information about which pairs of samples already occured together in one block.
Ex. solution for N=9:
1 1 1 0 0 0 0 0 0 → (1,2,3) |
0 0 0 1 1 1 0 0 0 → (4,5,6) | Layer 1
0 0 0 0 0 0 1 1 1 → (7,8,9) |
1 0 0 1 0 0 1 0 0 → (1,4,7) ]
0 1 0 0 1 0 0 1 0 → (2,5,8) ] Layer 2
0 0 1 0 0 1 0 0 1 → (3,6,9) ]
1 0 0 0 1 0 0 0 1 → (1,5,9) }
0 1 0 0 0 1 1 0 0 → (2,6,7) } Layer 3
0 0 1 1 0 0 0 1 0 → (3,4,8) }
1 0 0 0 0 1 0 1 0 → (1,6,8) ||
0 1 0 1 0 0 0 0 1 → (2,4,9) || Layer 4
0 0 1 0 1 0 1 0 0 → (3,5,7) ||
If this post is in the wrong category, please feel free to move it appropriately.
Thanks
David