Collecting all solutions in a recursive branching algorithm

Moved this from the New to Julia to a more fitting section

I’m trying to translate a divide and conquer algorithm for a modified subset sum problem from Python. A simplified version might look like this: You are given an array of structs with a mass attribute, and a float target. Your task is to return all of the subsets of the input array whose mass adds up to the target, save for some small allowed error.

The model solution is to recursively traverse the entire array, branching off on each element, either including it in the solution and lowering the target by its mass, or skipping it. If you want to find every solution, the naive algorithm just assembles the solutions from both branches:

function subset_sum(objects, target, chosen)
  if very_close_to_zero(target)
    return [chosen]
  elseif target < 0 || length(objects) == 0
    return []
  else
    first, rest = objects[1], objects[2:end]
    return [
      subset_sum(rest target, chosen) ;
      subset_sum(rest, target - first.mass, push(chosen, first))
    ]
  end
end

But I assume this will be very allocation heavy. Is there a better Julian-esque solution? In Python I ended up doing basically this, only passing down the accumulator as a tuple because of the immutability, and appending chosen to a global result array instead of returning it.

Note that although dynamic programming would help in this model problem, my specific task doesn’t mesh well with it because I have to keep track of a lot of additional stuff, like limited number of skipped objects, each object actually having several possible masses (and I can choose whichever one I need), etc. If you’re interested, I’d be glad to be proven wrong in this regard, but it might be better suited for another discussion.

objects[2:end] allocates an array.
push(chosen, first) allocates an array.
This is the question of copying vs trailing in recursive search. I would go the trailing way to avoid allocating :
Don’t touch objects but keep a variable for the object you are considering. Never allocate a new chosen vector, but instead, pass chosen, do the first recursive call, add the new object to chosen, do the second recursive call, remove the object from chosen and return the solutions. Use in place operators like push! and pop! to avoid allocating arrays, and at the beginning, use hintsize! to give the maximal size of chosen.

As your function will mutate chosen, you should call it subset_sum! to observe the julia naming convention .

I’m also suspecting that this allocates every time, you should pass a solution accumulator along you recursive calls, but maybe there is another way…

3 Likes

I’me modified my code according to your suggestions, thanks! Now it’s something like the following:

function subset_sum(objects, target_mass)
    function go!(index, remaining_mass, chosen, solutions)
        if remaining_mass ≈ 0
            # This copies |chosen| objects
            # That's ok, assuming there won't be too many solutions
            push!(solutions, objects[chosen])
        elseif remaining_mass < 0 || index == length(objects) + 1
            solutions
        else
            go!(index + 1, remaining_mass, chosen, solutions)
            go!(
                index + 1,
                remaining_mass - objects[index].mass,
                push!(chosen, index),
                solutions,
            )
            pop!(chosen)
            solutions
        end
    end

    chosen = []
    sizehint!(chosen, length(objects))
    go!(1, target_mass, chosen, [])
end

Looking at the output of @time, there still might be some unneeded allocations…

> objects40 = map(i -> (mass = i, ), rand(1.0:0.1:5000.0, 40))
> @time Precursor.subset_sum(objects40, 15_000)

24.818297 seconds (504.47 M allocations: 7.517 GiB, 5.86% gc time)
277-element Vector{Any}: [...]

It might also just be all of those pushes and pops. I will investigate this, but I though I might as well copy it here so that you all have an opportunity to think about this, too.

EDIT: It definitely aren’t the pushes and pops (not that I thought that likely in the first place). When I stop modifying chosen — which of course gives me incorrect solutions — the runtime is very similar, and the allocations are basically the same:

 21.103157 seconds (504.53 M allocations: 7.521 GiB, 6.95% gc time, 0.27% compilation time)

EDIT2: When I strip the chosen and solutions (so that I’m only dry-running the algorithm until I reach the final mass), there’s still a bunch of allocations, although the runtime is a bit quicker.

function subset_sum(objects, target_mass)
    function go!(index, remaining_mass)
        if remaining_mass ≈ 0
            return
        elseif remaining_mass < 0 || index == length(objects) + 1
            return
        else
            go!(index + 1, remaining_mass)
            go!(index + 1, remaining_mass - objects[index].mass)
            return
        end
    end

    go!(1, target_mass)
end

with @time giving

18.510571 seconds (504.52 M allocations: 7.520 GiB, 7.50% gc time, 0.18% compilation time)

Seems like the recursive calls themselves are pretty costly. Converting to a tail-call unfortunately isn’t possible in my case.

EDIT3: Interestignly enough, pulling out the definition of go! into a top-level function cuts the execution time by half.

Note that [] is an empty Vector{Any} which isn’t efficient. You want to use, say,Float64[] instead if your array will / should hold floating point values.

Thanks for the tip. The array chosen holds integers, and solutions holds arrays of some the objects from the input array objects, whatever they might be. Does it make sense to specialise the type of solutions as well? (And how should I do it properly?)

By the way, note my latest edit on the previous post. Over 7GB of allocations even though I’m only subtracting some numbers.

EDIT: The performance seems to be roughly the same even with the types of both arrays specified. For the solutions array, I did it like this:

function subset_sum(objects::Objects, target_mass, segments) where Objects
    solutions = Objects[]
    chosen = UInt16[]
    [...]
end

Using the Combinatorics.jl package, you can presumably do something like:

using Combinatorics
function subsets_sum(objects, target, tol)
    results = Vector{eltype(objects)}[]
    for c in combinations(objects)
        if abs(sum(o -> o.mass, c) - target) ≤ tol
            push!(results, c)
        end
    end
    return results
end

(Though this has suboptimal efficiency because it doesn’t do any dynamic programming or pruning of the combinations, but at least the code is short and clear.)

This is equivalent to remaining_mass == 0 because you didn’t specify an absolute tolerance.

See the isapprox documentation comments on “Note that x ≈ 0 …”.

Thanks. By the way, any idea why the multi-threaded version is only slighly (13s vs 15s) faster than the single-threaded one?


function subset_sum(objects :: Objects, target_mass, segments) where Objects
    # solutions are only written to, can be shared among threads
    solutions = Objects[]

    # The number of threads is 8, number of objects 40
    Threads.@threads for beginning = 1:length(objects)
        # chosen is written to and read, has to be unique at least for each thread
        chosen = UInt16[]
        sizehint!(chosen, length(objects))
        # true means that go! has to pick in beginning-th object to chosen 
        go!(objects, beginning, target_mass, true, chosen, solutions)
    end
    solutions
end

function subset_sum(objects, target_mass)
    function go!(objects, index, remaining_mass)
        if remaining_mass ≈ 0
            return
        elseif remaining_mass < 0 || index == length(objects) + 1
            return
        else
            go!(objects, index + 1, remaining_mass)
            go!(objects, index + 1, remaining_mass - objects[index])
            return
        end
    end
    go!(objects, 1, target_mass)
end
function go!(objects, index, remaining_mass)
    if remaining_mass ≈ 0
        return
    elseif remaining_mass < 0 || index == length(objects) + 1
        return
    else
        go!(objects, index + 1, remaining_mass)
        go!(objects, index + 1, remaining_mass - objects[index])
        return
    end
end

subset_sum2(objects, target_mass) = go!(objects, 1, target_mass)
objects = [rand() for i in 1:27]
target = 17

@time subset_sum(objects, target)
@time subset_sum2(objects, target)

julia> @time subset_sum(objects, target)
 16.628330 seconds (268.44 M allocations: 4.000 GiB, 3.06% gc time)

julia> @time subset_sum2(objects, target)
  2.857016 seconds

There is indeed a big difference. This is maybe due to this issue, but I’m not sure.

1 Like

Here is what I came up with: first I tried to type the program to better understand it and get rid of Vector{Any}. Then I mostly follows @etienne_dg’s advice. The program

using BenchmarkTools

function push(v::Vector{T}, e::T) where T
  vp = copy(v)
  push!(vp, e)
end

function subset_sum1(objects, target, chosen)
  if target ≈ 0
    return [chosen]
  elseif target < 0 || length(objects) == 0
    return []
  else
    first, rest = objects[1], objects[2:end]
    return [
      subset_sum1(rest, target, chosen) ;
      subset_sum1(rest, target - first.mass, push(chosen, first))
    ]
  end
end

function subset_sum2(objects::Vector{T}, target, chosen::Vector{T}) where T
  if target ≈ 0
    return [chosen]
  elseif target < 0 || length(objects) == 0
    return Vector{T}[]
  else
    first, rest = objects[1], objects[2:end]
    return Vector{T}[
      subset_sum1(rest, target, chosen) ;
      subset_sum1(rest, target - first.mass, push(chosen, first))
    ]
  end
end

function subset_sum3(objects::Vector{T}, target, chosen::Vector{T}, solutions::Vector{Vector{T}}) where T
  if target ≈ 0
    push!(solutions, copy(chosen))
  elseif target > 0 && length(objects) > 0
    first = pop!(objects)
    subset_sum3(objects, target, chosen, solutions)
    push!(chosen, first)
    subset_sum3(objects, target - first.mass, chosen, solutions)
    pop!(chosen)
    push!(objects, first)
  end
end
function subset_sum3(objects::Vector{T}, target, chosen::Vector{T}) where T
  solutions = Vector{T}[]
  subset_sum3(reverse(objects), target, chosen, solutions)
  solutions
end

objects = map(i -> (mass = i, ), rand(1.0:0.1:5000.0, 30))
ans1 = subset_sum1(objects, 15_000, [])
println(length(ans1))
ans2 = subset_sum2(objects, 15_000, eltype(objects)[])
println(length(ans2))
@assert ans2 == ans1
ans3 = subset_sum3(objects, 15_000, eltype(objects)[])
println(length(ans3))
@assert ans3 == ans2

@btime subset_sum1(objects, 15_000, [])
@btime subset_sum2(objects, 15_000, eltype(objects)[])
@btime subset_sum3(objects, 15_000, eltype(objects)[])

objects = map(i -> (mass = i, ), rand(1.0:0.1:5000.0, 40))
@time ans3 = subset_sum3(objects, 15_000, eltype(objects)[])
println(length(ans3))

shows in the console

11
11
11
  1.203 s (35966168 allocations: 3.09 GiB)
  1.187 s (29971807 allocations: 3.00 GiB)
  162.768 ms (20 allocations: 2.56 KiB)
  3.679973 seconds (332 allocations: 58.641 KiB)
317

I don’t have a way to test this atm, but does it give back correct solutions? Let’s say on objects being 5 things with masses 5,10,15,20,25, the target being 30? If I’m not mistaken, the chosen only get extended and the changes propagate into all future calls of subset_sum

I checked it with @assertion for small test cases. Please see the posted code.

I finally got to my Mac. Sure, the impelemntations are consistent with each other, but neither gives the correct results. Consider:

> objects_simple = map(i -> (mass = i, ), 5:5:25)
> subset_sum3(objects_simple, 30, eltype(objects_simple)[])
3-element Vector{Vector{NamedTuple{(:mass,), Tuple{Int64}}}}:
 [(mass = 25,), (mass = 20,), (mass = 25,), (mass = 15,), (mass = 25,), (mass = 20,), (mass = 10,), (mass = 25,), (mass = 20,), (mass = 15,)  …  (mass = 25,), (mass = 20,), (mass = 25,), (mass = 15,), (mass = 25,), (mass = 20,), (mass = 10,), (mass = 25,), (mass = 20,), (mass = 15,)]
 [(mass = 25,), (mass = 20,), (mass = 25,), (mass = 15,), (mass = 25,), (mass = 20,), (mass = 10,), (mass = 25,), (mass = 20,), (mass = 15,)  …  (mass = 25,), (mass = 20,), (mass = 25,), (mass = 15,), (mass = 25,), (mass = 20,), (mass = 10,), (mass = 25,), (mass = 20,), (mass = 15,)]
 [(mass = 25,), (mass = 20,), (mass = 25,), (mass = 15,), (mass = 25,), (mass = 20,), (mass = 10,), (mass = 25,), (mass = 20,), (mass = 15,)  …  (mass = 25,), (mass = 20,), (mass = 25,), (mass = 15,), (mass = 25,), (mass = 20,), (mass = 10,), (mass = 25,), (mass = 20,), (mass = 15,)]

That’s because everything ends up being in chosen, without getting removed.

I didn’t see push in your MWE and assumed push! was meant? I imagine it is something along the lines of

function push(v::Vector{T}, e::T) where T
  vp = copy(v)
  push!(vp, e)
end

then? Will edit the post with a corrected version. Done.