 # Function speed up, reduce allocations

Hi,

I am working on a combinatorial problem called Block Design:

Block design - Wikipedia ( specifically Pairwise balanced uniform designs (2-designs or BIBDs) )

I try to group N=25 samples (e.g. integers 1:25) into “blocks” (aka pools) of 5 samples each, where each pair of samples only occurs in one block together. The blocks should be chosen so that 5 of these blocks form a partition of the set of all 25 samples. A partition of the set of all samples into 5 blocks as described above call a “layer”, I try to find as many unique layers as possible. (in the above example there are 6 layers)

I implemented a backtracking algorithm to find solutions to the problem, which currently finds fesible solutions for N=9, N=16,& N=25, but takes prohibitively long for N=36.

Timing for N=25:

@time BlockDesign.run()
1.606897 seconds (2.06 M allocations: 307.621 MiB, 0.59% gc time)

I think the backtracking algorithm itself can be improved, but looking at the large number of allocations and the Profiler output pointing to the feasible(PD, M, M_) function, this could be address first.
(Is there a way to attach the complete profiler output as .txt?):

Function call highlighted by profiler:
` fs = myfind(feasible(PD, M, M_))`

In short: how can we make the function feasible(PD, M, M_) faster, either with the current implementation using the Int64 matrices M & M_ or in another way.

The functions + helper functions needed to run above function call:

``````struct PoolDesign{T<:Integer}
N::T
m::T
q::T
lmax::T
end

PD = PoolDesign(
25, # number samples
5, # samples per pool
5, # pools per layer
6, # layers in pooling design
)

function myfind(c)
a = similar(c, Int)
count = 1
for i in eachindex(c)
a[count] = i
count += (c[i] != zero(eltype(c)))
end
return resize!(a, count-1)
end

function pool_idx(M, PD)
for (i, row) in enumerate(eachrow(M))
if sum(row) < PD.m
return i
end
end
size(M,1)
end

layer_idx(M, PD) = ((pool_idx(M, PD) - 1)÷PD.q) + 1

layer(PD, l) =  @view M[ ((l-1)*PD.q+1):(l * PD.q), :]

current_layer(M, PD) = layer(PD, layer_idx(M, PD))

function feasible(PD, M, M_)
feas = collect(1:PD.N)
cl = current_layer(M, PD)
# if current layer is empty, all samples are feasible
if !iszero(cl)
for i in 1:PD.N
for q in 1:PD.q
if cl[q, i] == 1
feas[i] = 0
end
end
end
# if current pool is empty,
# all remaining unplaced samples in layer are feasible
if !iszero(current_pool(M, PD))
for k in myfind(current_pool(M, PD))
for l in 1:PD.N
feas[l] = feas[l] * M_[k, l]
end
feas[k] = 0
end
end
end
return feas
end

dims = (PD.lmax * PD.m, PD.N)
M = zeros(Int64, dims)
M_ = ones(Int64, PD.N, PD.N)

fs = myfind(feasible(PD, M, M_))
``````

The matrix M stores the Block design, assigning the samples (cols) to the different blocks (rows), where M(j,i) = 1 means that sample i is block j.
The matrix M_ is symmetric and stores the information about which pairs of samples already occured together in one block.

Ex. solution for N=9:

1 1 1 0 0 0 0 0 0 → (1,2,3) |
0 0 0 1 1 1 0 0 0 → (4,5,6) | Layer 1
0 0 0 0 0 0 1 1 1 → (7,8,9) |
1 0 0 1 0 0 1 0 0 → (1,4,7) ]
0 1 0 0 1 0 0 1 0 → (2,5,8) ] Layer 2
0 0 1 0 0 1 0 0 1 → (3,6,9) ]
1 0 0 0 1 0 0 0 1 → (1,5,9) }
0 1 0 0 0 1 1 0 0 → (2,6,7) } Layer 3
0 0 1 1 0 0 0 1 0 → (3,4,8) }
1 0 0 0 0 1 0 1 0 → (1,6,8) ||
0 1 0 1 0 0 0 0 1 → (2,4,9) || Layer 4
0 0 1 0 1 0 1 0 0 → (3,5,7) ||

If this post is in the wrong category, please feel free to move it appropriately.

Thanks
David

It would simplify things considerably if you could post the MWE in one markup block. Then I wouldn’t need to manually compose it.

2 Likes

my guess just looking at the code is that you should probably expand this `myfind` function into this loop, to get rid of the array that is created by `myfind`, and just iterate over the elements that are necessary directly. That will make the code a little less modular, but at first sight it seems to be the cause of the allocations you are seeing.

That is, do something like:

``````for val in current_pool(M, PD)
if val != zero(...)
# ... do what you need to do
end
end

``````

Thanks for your answer. I also use the `myfind()` function in other places of the code to get the non-zero elements from the passed arrays.
When ran with `myfind()` I get:

``````@time myfind(feasible(PD, M, M_))
0.000012 seconds (7 allocations: 768 bytes)
``````

Compared to:

``````@time feasible(PD, M, M_)
0.000010 seconds (6 allocations: 480 bytes)
``````

There are already allocations occuring in `feasible(PD, M, M_)`, where I tried to set the elements zero, and filter the zero elements out later in `myfind()`.

`myfind` is called from within `feasible`, isn’t it?

(by the way, you can write that myfind with `findall(!isequal(0),x)`, as far as I could understand it, but that won’t be any better)

Yes it is, sorry for the confusion.

I got now:

``````function feasible(PD, M, M_)
feas = collect(1:PD.N)

cl = current_layer(M, PD)
# if current layer is empty, all samples are feasible
if !iszero(cl)

for i in 1:PD.N
for q in 1:PD.q
if cl[q, i] == 1
feas[i] = 0
end
end

end

# if current pool is empty,
# all remaining unplaced samples in layer are feasible
if !iszero(current_pool(M, PD))

for k in current_pool(M, PD)
if k != zero(k)
for l in 1:PD.N
feas[l] = feas[l] * M_[k, l]
end
feas[k] = 0
end
end
end
end
return feas
end

@time feasible(PD, M, M_)
0.000011 seconds (6 allocations: 480 bytes)

``````

I noticed now that `M` is not an input of `layer`. Probably it should be.

4 Likes

Good eye! This did the trick…
It is down to

``````@time feasible(PD, M, M_)
0.000004 seconds (1 allocation: 160 bytes)
``````

Is the last allocation due to

`````` feas = collect(1:PD.N)
``````

?

That certainly is one allocation.

1 Like

Since `feasible(PD, M, M_)` is called frequently, I will make a preallocated array `feas` that will get passed as a parameter and modified in the function. Thanks for your help!

2 Likes

Hi, if you want to construct large balanced incomplete block designs (BIBDs), it might be good to have look at this work (Memetic collaborative approaches for finding balanced incomplete block designs - ScienceDirect), where they proposed several algorithms to construct BIBDs. In your example it seems that you want to construct a resolvable balanced incomplete block design with v = 25, k =5, b = 5, r = 6?

1 Like

Or, preferably, `findall(!iszero, x)`

3 Likes

Thanks for suggesting this paper, it is very relevant!

you want toconstruct a resolvable balanced incomplete block design with v = 25, k =5, b = 5, r = 6?

Yes, exactly.