Function speed up, reduce allocations

Hi,

I am working on a combinatorial problem called Block Design:

Block design - Wikipedia ( specifically Pairwise balanced uniform designs (2-designs or BIBDs) )

I try to group N=25 samples (e.g. integers 1:25) into “blocks” (aka pools) of 5 samples each, where each pair of samples only occurs in one block together. The blocks should be chosen so that 5 of these blocks form a partition of the set of all 25 samples. A partition of the set of all samples into 5 blocks as described above call a “layer”, I try to find as many unique layers as possible. (in the above example there are 6 layers)

I implemented a backtracking algorithm to find solutions to the problem, which currently finds fesible solutions for N=9, N=16,& N=25, but takes prohibitively long for N=36.

Timing for N=25:

@time BlockDesign.run()
1.606897 seconds (2.06 M allocations: 307.621 MiB, 0.59% gc time)

I think the backtracking algorithm itself can be improved, but looking at the large number of allocations and the Profiler output pointing to the feasible(PD, M, M_) function, this could be address first.
(Is there a way to attach the complete profiler output as .txt?):

Function call highlighted by profiler:
fs = myfind(feasible(PD, M, M_))

In short: how can we make the function feasible(PD, M, M_) faster, either with the current implementation using the Int64 matrices M & M_ or in another way.

The functions + helper functions needed to run above function call:

struct PoolDesign{T<:Integer}
    N::T
    m::T
    q::T
    lmax::T
end

PD = PoolDesign(
    25, # number samples
    5, # samples per pool
    5, # pools per layer
    6, # layers in pooling design
)

function myfind(c)
    a = similar(c, Int)
    count = 1
    for i in eachindex(c)
        a[count] = i
        count += (c[i] != zero(eltype(c)))
    end
    return resize!(a, count-1)
end

 function pool_idx(M, PD)
    for (i, row) in enumerate(eachrow(M))
        if sum(row) < PD.m
            return i
        end
    end
    size(M,1)
end

layer_idx(M, PD) = ((pool_idx(M, PD) - 1)÷PD.q) + 1

layer(PD, l) =  @view M[ ((l-1)*PD.q+1):(l * PD.q), :]

current_layer(M, PD) = layer(PD, layer_idx(M, PD))

function feasible(PD, M, M_)
    feas = collect(1:PD.N)
    cl = current_layer(M, PD)
    # if current layer is empty, all samples are feasible
    if !iszero(cl) 
        for i in 1:PD.N
            for q in 1:PD.q
                if cl[q, i] == 1
                    feas[i] = 0
                end
            end
        end
        # if current pool is empty,
        # all remaining unplaced samples in layer are feasible
        if !iszero(current_pool(M, PD))
            for k in myfind(current_pool(M, PD))
                for l in 1:PD.N
                    feas[l] = feas[l] * M_[k, l]
                end
                feas[k] = 0
            end
        end
    end
    return feas
end

dims = (PD.lmax * PD.m, PD.N)
M = zeros(Int64, dims)
M_ = ones(Int64, PD.N, PD.N)

fs = myfind(feasible(PD, M, M_))

The matrix M stores the Block design, assigning the samples (cols) to the different blocks (rows), where M(j,i) = 1 means that sample i is block j.
The matrix M_ is symmetric and stores the information about which pairs of samples already occured together in one block.

Ex. solution for N=9:

1 1 1 0 0 0 0 0 0 → (1,2,3) |
0 0 0 1 1 1 0 0 0 → (4,5,6) | Layer 1
0 0 0 0 0 0 1 1 1 → (7,8,9) |
1 0 0 1 0 0 1 0 0 → (1,4,7) ]
0 1 0 0 1 0 0 1 0 → (2,5,8) ] Layer 2
0 0 1 0 0 1 0 0 1 → (3,6,9) ]
1 0 0 0 1 0 0 0 1 → (1,5,9) }
0 1 0 0 0 1 1 0 0 → (2,6,7) } Layer 3
0 0 1 1 0 0 0 1 0 → (3,4,8) }
1 0 0 0 0 1 0 1 0 → (1,6,8) ||
0 1 0 1 0 0 0 0 1 → (2,4,9) || Layer 4
0 0 1 0 1 0 1 0 0 → (3,5,7) ||

If this post is in the wrong category, please feel free to move it appropriately.

Thanks
David

It would simplify things considerably if you could post the MWE in one markup block. Then I wouldn’t need to manually compose it.

2 Likes

my guess just looking at the code is that you should probably expand this myfind function into this loop, to get rid of the array that is created by myfind, and just iterate over the elements that are necessary directly. That will make the code a little less modular, but at first sight it seems to be the cause of the allocations you are seeing.

That is, do something like:

for val in current_pool(M, PD)
    if val != zero(...)
        # ... do what you need to do
    end
end

Thanks for your answer. I also use the myfind() function in other places of the code to get the non-zero elements from the passed arrays.
When ran with myfind() I get:

@time myfind(feasible(PD, M, M_))
  0.000012 seconds (7 allocations: 768 bytes)

Compared to:

@time feasible(PD, M, M_)
  0.000010 seconds (6 allocations: 480 bytes)

There are already allocations occuring in feasible(PD, M, M_), where I tried to set the elements zero, and filter the zero elements out later in myfind().

myfind is called from within feasible, isn’t it?

(by the way, you can write that myfind with findall(!isequal(0),x), as far as I could understand it, but that won’t be any better)

Yes it is, sorry for the confusion.

I got now:

function feasible(PD, M, M_)
    feas = collect(1:PD.N)

    cl = current_layer(M, PD)
    # if current layer is empty, all samples are feasible
    if !iszero(cl)
                
        for i in 1:PD.N
            for q in 1:PD.q
                if cl[q, i] == 1
                    feas[i] = 0
                end
            end
            
        end

        # if current pool is empty,
        # all remaining unplaced samples in layer are feasible
        if !iszero(current_pool(M, PD))

            for k in current_pool(M, PD)
                if k != zero(k)
                    for l in 1:PD.N
                        feas[l] = feas[l] * M_[k, l]
                    end
                    feas[k] = 0
                end
            end
        end
    end
    return feas
end

@time feasible(PD, M, M_)
0.000011 seconds (6 allocations: 480 bytes)

I noticed now that M is not an input of layer. Probably it should be.

4 Likes

Good eye! This did the trick…
It is down to

@time feasible(PD, M, M_)
  0.000004 seconds (1 allocation: 160 bytes)

Is the last allocation due to

 feas = collect(1:PD.N)

?

That certainly is one allocation.

1 Like

Since feasible(PD, M, M_) is called frequently, I will make a preallocated array feas that will get passed as a parameter and modified in the function. Thanks for your help!

2 Likes

Hi, if you want to construct large balanced incomplete block designs (BIBDs), it might be good to have look at this work (Memetic collaborative approaches for finding balanced incomplete block designs - ScienceDirect), where they proposed several algorithms to construct BIBDs. In your example it seems that you want to construct a resolvable balanced incomplete block design with v = 25, k =5, b = 5, r = 6?

1 Like

Or, preferably, findall(!iszero, x)

3 Likes

Thanks for suggesting this paper, it is very relevant!

you want toconstruct a resolvable balanced incomplete block design with v = 25, k =5, b = 5, r = 6?

Yes, exactly.