How to solve set of nonlinear fixed-point equations using Anderson acceleration in NLsolve.jl?

liuyxpp · February 25, 2021, 8:54am

I have a set of two nonlinear fixed-point equations like

x = g1(x, y)
y = g2(x, y)

where x and y are n-dimensional arrays (n=1,2,3), length(x) and length(y) can be very large, e.g. 128x128x128 for 3D problem. How can I solve it using anderson algorithm in the NLsolve.jl package which seems to be designed for x = g(x) only?
Thanks!

FMeirinhos · February 27, 2021, 12:29pm

Note that your problem has precisely the form z = g(z) if you were to consider z := (x, y) and g := (g1, g2).

The only thing you have to do is to concatenate both variables and functions. Assuming x and y have the same size and dimensions, one way to do this would be
z = cat(x, y; dims=ndims(x)+1) # concatenate along a new dimension
And then a similar procedure for your functions.

But personally I am not a fan of this because explicit concatenation with these cat functions is not very efficient, especially for big arrays like yours. It also only works properly if x and y have the same size and typeof. On top of that, you’d need to throw @views everywhere to avoid further allocations.

So something much nicer would be to use an array of arrays such as
z = [x, y]
g(z) = [g1(z[1], z[2]), g2(z[1], z[2])]

But nlsolve can’t really operate on such containers.

However, it can work on slightly more refined containers such as the ones in RecursiveArraysTools, which are designed specifically for these tasks.

using RecursiveArrayTools
z = VectorOfArray([x, y])
g(z) = VectorOfArray([g1(z[1], z[2]), g2(z[1], z[2])])
nlsolve(g, z; method=:anderson)

liuyxpp · February 27, 2021, 12:46pm

Great! I will try them out and mark this post as the solution if everything goes well. Thanks!

liuyxpp · March 1, 2021, 2:39pm

By studying the source code of NLsolve, I am confused on which function should I serve the nlsolve.
It seems that nlsolve expect f(x) = 0 to solve. Thus, should I do

f(z) = g(z) - z
nlsolve(f, z; method=:anderson)

instead?

Another issue I have encountered is the way NLsolve defines its AndersonCache object. The line 24 in NLsolve.jl/src/solvers/anderson.jl is

mmax = min(length(x), m)

which is too restrictive for VectorOfArray objects. Here x is a VectorOfArray object, thus length(x) is the number of AbstractArrays in the object (it is 2 in this example). However, we expect mmax can be much larger because the length of each AbstractArray in the object can be very large, e.g. 128x128x128 for 3D problems.

At present, I wrote a constructor for AndersonCache by myself to allow any m.

FMeirinhos · March 1, 2021, 2:57pm

A fixed-point equation f(z) = z can always be mapped to g(z) = 0 by precisely that recipe!
Note that NLsolve.jl has a wrapper for this, where instead of calling nlsolve you call fixedpoint and this is taken care for you https://github.com/JuliaNLSolvers/NLsolve.jl/blob/master/src/nlsolve/fixedpoint.jl#L30
But personally I would keep using the standard nlsolve function call: being more explicit helps in the long run.

Also note that m is NOT related with the size of your solution z (otherwise that min operation could have you end up with an underdetermined system!). Here you can find explained what m is for Anderson acceleration - Wikipedia

liuyxpp · March 1, 2021, 3:13pm

So if my fixed point is x=g(x), I should input f(x) = g(x) - x to nlsolve, right?

I know exactly what m does in Anderson mixing. The issue here is the Base.length defined by VectorOfArray only returns the number of vectors it contains. Thus If I use the NLsove provided version of AndersonCache constructor, m is restricted to be less than 2 (it always choose the minmum of m given and length(VectorOfArray)). I have written my own version of Anderson mixing long ago in C++ and I can use m as large as 100 to accelerate the convergence.

liuyxpp · March 6, 2021, 1:04pm

I marked this post as a solution as it provides a very good approach to represent a set of fixed-point equations involving many high-dimensional data. However, this approach cannot directly apply to NLsolve.anderson which deals with VectorOfArray objects in a way that it only allows <= length(VectorOfArray) rather than the length of multi-dimensional data.

I am ending up rewriting my own version of Anderson mixing which closely follows Walker’s pseudocode. (ie. its way to treat β relaxation). As Q*R*γ results in a flatten vector, we have to figure out a way to copy this long vector back into VectorOfArray{MyType} object. At present, I overload Base.copyto! to accomplish the task. But it is not universal.

Thanks for your input @FMeirinhos !

liuyxpp · March 6, 2021, 1:16pm

And here is my implementation (the intial_x has to be AbstractVectorOfArray):

function anderson_(df::Union{NonDifferentiable, OnceDifferentiable},
                            initial_x::AbstractArray,
                            xtol::Real,
                            ftol::Real,
                            iterations::Integer,
                            store_trace::Bool,
                            show_trace::Bool,
                            extended_trace::Bool,
                            β::Real,
                            aa_start::Integer,
                            droptol::Real,
                            cache::NLsolve.AndersonCache)
    @. cache.x = initial_x
    tr = NLsolve.SolverTrace()
    tracing = store_trace || show_trace || extended_trace

    aa_iteration = cache.γs !== nothing
    x_converged, f_converged, converged = false, false, false
    m = aa_iteration ? length(cache.γs) : 0
    aa_iteration && (m_eff = 0)

    # Preallocate Δf = f_{k+1}(x) - f_{k}(x)
    Δf = deepcopy(cache.fxold)

    iter = 0
    while iter < iterations
        # evaluate function: f(x) = g(x) - x
        value!!(df, cache.x)
        # Give a name to value(df). It will not allocate.
        fx = value(df)

        # check that all values are finite
        NLsolve.check_isfinite(fx)

        # Compute g(x) = x + f(x)
        @. cache.g = cache.x + fx

        res = zero(eltype(initial_x))
        for i in 1:length(initial_x)
            res += vecnorm1(fx[i]) / vecnorm1(cache.x[i])
        end
        res /= length(initial_x)

        if iter % 100 == 0
            @show iter, res
        end

        # check convergence
        converged = (res < ftol) ? true : false
        if any(isnan, cache.x) || any(isnan, fx)
            break
        end
        # converged = x_converged || f_converged
        if converged
            @show iter, res
            break
        end

        # perform Anderson acceleration
        if iter > aa_start && m > 0
            # increase size of history
            m_eff += 1
            @. Δf = fx - cache.fxold
            if m_eff < (m+1)
                @. cache.Δgs[m_eff] = cache.g - cache.gold
            else
                # circularly shift differences of g
                Δg1 = cache.Δgs[1]
                for i in 1:(m-1)
                    cache.Δgs[i] = cache.Δgs[i+1]
                end
                cache.Δgs[m] = Δg1
                @. cache.Δgs[m] = cache.g - cache.gold
            end
        end

        @. cache.fxold = fx
        @. cache.gold = cache.g

        if m_eff == 0 || m == 0
            if 0 < β ≤ 1.0
                @. cache.x += β * fx
            else
                @. cache.x = cache.g
            end
        else
            if m_eff == 1
                NLsolve.qradd!(cache.Q, cache.R, vec(Δf), m_eff)
            else
                if m_eff > m
                    # delete left-most column of QR decomposition
                    NLsolve.qrdelete!(cache.Q, cache.R, m)
                    m_eff -= 1
                end
                NLsolve.qradd!(cache.Q, cache.R, vec(Δf), m_eff)
            end

            # define current Q and R matrices
            Q, R = view(cache.Q, :, 1:m_eff), UpperTriangular(view(cache.R, 1:m_eff, 1:m_eff))

            # check condition
            if droptol > 0
                while cond(R) > droptol && m_eff > 1
                    NLsolve.qrdelete!(cache.Q, cache.R, m_eff)
                    m_eff -= 1
                    Q, R = view(cache.Q, :, 1:m_eff), UpperTriangular(view(cache.R, 1:m_eff, 1:m_eff))
                end
            end

            # solve least squares problem
            γs = view(cache.γs, 1:m_eff)
            ldiv!(R, mul!(γs, Q', vec(fx)))

            # update next iterate
            for i in 1:m_eff
                @. cache.g -= cache.γs[i] * cache.Δgs[i]
            end
            @. cache.x = cache.g
            if 0 < β < 1
                copyto!(Δf, Q * R * γs)
                @. cache.x -= (1-β) * (fx - Δf)
            end
        end
        iter += 1
    end

    return NLsolve.SolverResults("Anderson m=$m β=$β aa_start=$aa_start droptol=$droptol",
                        initial_x, deepcopy(cache.x), norm(value(df), Inf),
                        iter, x_converged, xtol, f_converged, ftol, tr,
                        first(df.f_calls), 0)
end

Topic		Replies	Views
[ANN] FixedPoint.jl - Solver of equations using Fixed Point Method [with a few bonuses] Package Announcements	8	1987	March 21, 2022
Modify custom solver to deal with multidimensional arrays General Usage nlsolve , nonlinear	10	637	March 2, 2022
Basic solving equation with NLsove New to Julia	11	2977	August 8, 2020
Solving two non-linear equations with two variables. Any suggestion? General Usage	2	257	April 6, 2023
How to solve non-linear system with input array? Optimization (Mathematical) question	4	914	September 3, 2019

How to solve set of nonlinear fixed-point equations using Anderson acceleration in NLsolve.jl?

Related topics