Function speed up, reduce allocations

david.brust · November 17, 2021, 10:51am

Hi,

I am working on a combinatorial problem called Block Design:

Block design - Wikipedia ( specifically Pairwise balanced uniform designs (2-designs or BIBDs) )

I try to group N=25 samples (e.g. integers 1:25) into “blocks” (aka pools) of 5 samples each, where each pair of samples only occurs in one block together. The blocks should be chosen so that 5 of these blocks form a partition of the set of all 25 samples. A partition of the set of all samples into 5 blocks as described above call a “layer”, I try to find as many unique layers as possible. (in the above example there are 6 layers)

I implemented a backtracking algorithm to find solutions to the problem, which currently finds fesible solutions for N=9, N=16,& N=25, but takes prohibitively long for N=36.

Timing for N=25:

@time BlockDesign.run()
1.606897 seconds (2.06 M allocations: 307.621 MiB, 0.59% gc time)

I think the backtracking algorithm itself can be improved, but looking at the large number of allocations and the Profiler output pointing to the feasible(PD, M, M_) function, this could be address first.
(Is there a way to attach the complete profiler output as .txt?):

Function call highlighted by profiler:
fs = myfind(feasible(PD, M, M_))

In short: how can we make the function feasible(PD, M, M_) faster, either with the current implementation using the Int64 matrices M & M_ or in another way.

The functions + helper functions needed to run above function call:

struct PoolDesign{T<:Integer}
    N::T
    m::T
    q::T
    lmax::T
end

PD = PoolDesign(
    25, # number samples
    5, # samples per pool
    5, # pools per layer
    6, # layers in pooling design
)

function myfind(c)
    a = similar(c, Int)
    count = 1
    for i in eachindex(c)
        a[count] = i
        count += (c[i] != zero(eltype(c)))
    end
    return resize!(a, count-1)
end

 function pool_idx(M, PD)
    for (i, row) in enumerate(eachrow(M))
        if sum(row) < PD.m
            return i
        end
    end
    size(M,1)
end

layer_idx(M, PD) = ((pool_idx(M, PD) - 1)÷PD.q) + 1

layer(PD, l) =  @view M[ ((l-1)*PD.q+1):(l * PD.q), :]

current_layer(M, PD) = layer(PD, layer_idx(M, PD))

function feasible(PD, M, M_)
    feas = collect(1:PD.N)
    cl = current_layer(M, PD)
    # if current layer is empty, all samples are feasible
    if !iszero(cl) 
        for i in 1:PD.N
            for q in 1:PD.q
                if cl[q, i] == 1
                    feas[i] = 0
                end
            end
        end
        # if current pool is empty,
        # all remaining unplaced samples in layer are feasible
        if !iszero(current_pool(M, PD))
            for k in myfind(current_pool(M, PD))
                for l in 1:PD.N
                    feas[l] = feas[l] * M_[k, l]
                end
                feas[k] = 0
            end
        end
    end
    return feas
end

dims = (PD.lmax * PD.m, PD.N)
M = zeros(Int64, dims)
M_ = ones(Int64, PD.N, PD.N)

fs = myfind(feasible(PD, M, M_))

The matrix M stores the Block design, assigning the samples (cols) to the different blocks (rows), where M(j,i) = 1 means that sample i is block j.
The matrix M_ is symmetric and stores the information about which pairs of samples already occured together in one block.

Ex. solution for N=9:

1 1 1 0 0 0 0 0 0 → (1,2,3) |
0 0 0 1 1 1 0 0 0 → (4,5,6) | Layer 1
0 0 0 0 0 0 1 1 1 → (7,8,9) |
1 0 0 1 0 0 1 0 0 → (1,4,7) ]
0 1 0 0 1 0 0 1 0 → (2,5,8) ] Layer 2
0 0 1 0 0 1 0 0 1 → (3,6,9) ]
1 0 0 0 1 0 0 0 1 → (1,5,9) }
0 1 0 0 0 1 1 0 0 → (2,6,7) } Layer 3
0 0 1 1 0 0 0 1 0 → (3,4,8) }
1 0 0 0 0 1 0 1 0 → (1,6,8) ||
0 1 0 1 0 0 0 0 1 → (2,4,9) || Layer 4
0 0 1 0 1 0 1 0 0 → (3,5,7) ||

If this post is in the wrong category, please feel free to move it appropriately.

Thanks
David

goerch · November 17, 2021, 11:46am

It would simplify things considerably if you could post the MWE in one markup block. Then I wouldn’t need to manually compose it.

lmiq · November 17, 2021, 11:52am

my guess just looking at the code is that you should probably expand this myfind function into this loop, to get rid of the array that is created by myfind, and just iterate over the elements that are necessary directly. That will make the code a little less modular, but at first sight it seems to be the cause of the allocations you are seeing.

That is, do something like:

for val in current_pool(M, PD)
    if val != zero(...)
        # ... do what you need to do
    end
end

david.brust · November 17, 2021, 12:17pm

Thanks for your answer. I also use the myfind() function in other places of the code to get the non-zero elements from the passed arrays.
When ran with myfind() I get:

@time myfind(feasible(PD, M, M_))
  0.000012 seconds (7 allocations: 768 bytes)

Compared to:

@time feasible(PD, M, M_)
  0.000010 seconds (6 allocations: 480 bytes)

There are already allocations occuring in feasible(PD, M, M_), where I tried to set the elements zero, and filter the zero elements out later in myfind().

lmiq · November 17, 2021, 12:19pm

myfind is called from within feasible, isn’t it?

(by the way, you can write that myfind with findall(!isequal(0),x), as far as I could understand it, but that won’t be any better)

david.brust · November 17, 2021, 12:30pm

Yes it is, sorry for the confusion.

I got now:

function feasible(PD, M, M_)
    feas = collect(1:PD.N)

    cl = current_layer(M, PD)
    # if current layer is empty, all samples are feasible
    if !iszero(cl)
                
        for i in 1:PD.N
            for q in 1:PD.q
                if cl[q, i] == 1
                    feas[i] = 0
                end
            end
            
        end

        # if current pool is empty,
        # all remaining unplaced samples in layer are feasible
        if !iszero(current_pool(M, PD))

            for k in current_pool(M, PD)
                if k != zero(k)
                    for l in 1:PD.N
                        feas[l] = feas[l] * M_[k, l]
                    end
                    feas[k] = 0
                end
            end
        end
    end
    return feas
end

@time feasible(PD, M, M_)
0.000011 seconds (6 allocations: 480 bytes)

lmiq · November 17, 2021, 12:33pm

I noticed now that M is not an input of layer. Probably it should be.

david.brust · November 17, 2021, 12:43pm

Good eye! This did the trick…
It is down to

@time feasible(PD, M, M_)
  0.000004 seconds (1 allocation: 160 bytes)

Is the last allocation due to

 feas = collect(1:PD.N)

?

lmiq · November 17, 2021, 12:43pm

That certainly is one allocation.

david.brust · November 17, 2021, 12:52pm

Since feasible(PD, M, M_) is called frequently, I will make a preallocated array feas that will get passed as a parameter and modified in the function. Thanks for your help!

nhavt · November 17, 2021, 2:47pm

Hi, if you want to construct large balanced incomplete block designs (BIBDs), it might be good to have look at this work (Memetic collaborative approaches for finding balanced incomplete block designs - ScienceDirect), where they proposed several algorithms to construct BIBDs. In your example it seems that you want to construct a resolvable balanced incomplete block design with v = 25, k =5, b = 5, r = 6?

DNF · November 17, 2021, 4:03pm

Or, preferably, findall(!iszero, x)

david.brust · November 17, 2021, 8:27pm

Thanks for suggesting this paper, it is very relevant!

you want toconstruct a resolvable balanced incomplete block design with v = 25, k =5, b = 5, r = 6?

Yes, exactly.

Topic		Replies	Views
Combinatorics.permutations creating a lot of allocations Performance question	6	150	July 14, 2025
Help to reduce number of allocations General Usage	23	5491	January 6, 2019
Terribly unoptimized combination of multiset_permutations() and Iterators.product() Performance question	9	242	July 29, 2025
For loop performance vs "functional" performance Performance advent-of-code	17	520	January 4, 2025
Performance Tips for Combinatorial Problem Performance	16	1130	January 10, 2020

Function speed up, reduce allocations

Related topics