Is there a simple way to parallelize or distribute a function call that updates a local variable within a loop?

Hi!

I am working on implementing the hill-climbing algorithm to solve the one-dimensional bin packing problem but I got to a bottleneck. The function, which I called main for lack of a better name, repeats the same set of actions until the number of attempts has been consumed or if it has found the optimum which is given beforehand as part of an instance.

I want to know if this type of looping can be performed in parallel or distributed. And if so, is there a simple way to implement it? I have zero experience doing things in parallel and distributed :cry: . The idea is simple, loop and replace If the current “solution” at the current attempt is better than the better one found at the moment, if the optimal “solution” has been found then stop, continue otherwise.

function main(instance::BPPInstance, attempts::Int; earlybreak = false)
    # Objective function
    f(x) = sum(λ -> (λ.fullness)^2, x)
    # First attempt:
    # Get a random sorting of the bins, then search the search space, and then compute the score!!!
    bestbins = randomsort_bins(instance)
    climb!(bestbins, instance.item_weights, instance.bin_capacity)
    fbest = f(bestbins)
    bins_count = length(bestbins)
    # Remaining attempts
    for attempt in 2:attempts
        # Return the found configuration early and the attempts made to find it
        if earlybreak && bins_count == instance.optimal_solution
            return bestbins, attempt
        end
        # Get a random sorting of the bins, then search the search space, and then compute the score!!!
        bins = randomsort_bins(instance)
        climb!(bins, instance.item_weights, instance.bin_capacity)
        fbins = f(bins)
        if fbins > fbest
            bestbins = bins
            fbest = fbins
            bins_count = length(bestbins)
        end
    end
    return bestbins, attempts
end

Some time ago I saw the existence of Dagger, I wonder if this package can be used in this case.

First of all: Did you profile your code and are sure that this the slowest part of your code? Optimizing for performance only makes sense for the parts that most time is spent in. If you use VSCode you can use @profview myfunction() to get a Flamegraph for the performance. Bar size indicates time spent and bars are arranged such that you can see time spent in subfunctions beneath the each bar.

Parallelization is sadly not magic. As your algorithm looks right now, you can’t apply parallelization in a straight-forward manner. You’ll need to change it (slightly).

Right now each of you loop iterations share variables (e.g. bestbins, fbest,…) and maybe depend in some other way on each other through climb!. I don’t know what climb! does but assuming it does not modify instance then the only issue I see are the shared variables but each iteration is essentially independent, right? So the simplest thing you could do is to run a couple of main in parallel and in the end take the best solution across these. You lose the early-stopping feature but that’s maybe acceptable (and could be fixed if really needed). In code:

function main_parallel(instance, attempts)
    # Objective function
    f(x) = sum(λ -> (λ.fullness)^2, x)

    nthreads = Threads.nthreads()
    results = resize!([], nthreads)
    Threads.@threads for id in 1:nthreads
        results[id] = main(instance, ceil(Int, attempts/nthreads))
    end
    return argmax(x->f(x[1]), results) # return best solution
end

This is really just a very minimalistic and straight-forward way of doing the parallelization. If you have questions about this code please ask :slight_smile:

3 Likes