Simple Dykstra projection : Could it be faster?


I’m writting a simple Dykstra sheme for projecting a vector start onto equality constraints Ax = b and inequelity constraints x >= 0. It works great, but I wander if it could be faster ?

Here’s the current version :

using Random
N,M = 30,50
A = big.(rand(N,M));
x0 = big.(rand(M));
b = A*x0;
# check : 
A*x0 == b;
start = big.(randn(M));

get_AiAA(A) = A'inv(A*A')
proj_eq(x,A,b,AiAA) = x .- AiAA*(A*x-b)
proj_ineq(x::T) where T = x > T(0) ? x : T(0)
function naive_version(start,A,b)
    AiAA = get_AiAA(A)
    x = deepcopy(start)
    old_x = zero(x)
    y = zero(x)
    p = zero(x)
    q = zero(x)
    # iter = 0
    while !(old_x ≈ x)
        old_x .= x
        y .= proj_eq(x.+p,A,b,AiAA)
        p .= x .+ p .- y
        x .= proj_ineq.(y .+ q)
        q .= y + q - x
        # iter += 1
    # println("Converged in $iter iterations")
    return x
function update!(old_x,x,p,q,ProjP,Projq)
    old_x .= x
    x .= x .+ p
    p .= ProjP*x .- Projq
    x .= x .- p .+ q
    q .= x
    x[x.<0] .= 0
    q[q.>0] .= 0
function Dykstra(start,A,b)
    # Deal with the linear projection
    AiAA = A'inv(A*A')
    ProjP = AiAA*A
    Projq = AiAA*b

    # Dykstra : 
    x = deepcopy(start)
    old_x, p, q = zero(x), zero(x), zero(x)
    while !(old_x ≈ x)
        update!(old_x, x, p, q, ProjP, Projq)
    return x

naive_solution = naive_version(start,A,b);
solution = Dykstra(start,A,b);
using Test
using BenchmarkTools
@test solution ≈ naive_solution
@test all(solution .>=0)
@test A*solution ≈ b
@btime Dykstra($start,$A,$b);

I included a naive version that is known to be the right algorithm to compare to. On my machine it takes 2.1s for these array sizes, but it gets slow for bigger arrays.

It seems to be type-stable. I have concerns regarding broadcasting : could’nt it be faster with loops in the update! function ?

Profiling is your friend when it comes to questions like this. Here, it looks like almost all time is spent in the matrix-matrix multiplication with big floats. So thinking about how to optimize that or avoid it at all should probably be the first step.

I hear you. By profiling, It seems like i’m allocating a lot a BigFloats, and hence should go to MutableArithmetics.jl
However I have troubles understanding how to change my update! function so that it down not allocate…

It seems MutableArithmetics have some stuff for matrix multiplication. You could perhaps try that:

julia> a = big.(rand(300, 300));

julia> c = similar(a);

julia> @btime MutableArithmetics.mul_to!(c, a, a);
  1.930 s (180002 allocations: 8.93 MiB)

julia> @btime mul!(c, a, a);
  3.692 s (108720000 allocations: 5.27 GiB)

Do you really need the extra precision from BigFloats or is something like a Double64 from DoubleFloats.jl precise enough?


I need both :slight_smile: Which is why i’m testing with BigFloats and MultiFloats.jl, alongside standard Float64.

1 Like