Speed up solution of large system of equations

I have a system of equations:

\begin{aligned} & S^m_{a_m, e_m}-\mu^{0m}_{a_m, e_m}=\sum_{a_m=1}^{46} {\Pi}_{a_m, e_m, a_f, e_f}\sqrt{S^m_{a_m, e_m} S^f_{a_f, e_f}} \prod_{k=0}^{T(a_m, a_f)-1}\left(\frac{\mu^{0m}_{a_m+k, e_m} \mu^{0f}_{a_f+k, e_f}}{S^m_{a_m+k, e_m} S^f_{a_f+k, e_f}}\right)^{(1 / 2)(\beta(1-\delta))^k}, \\ & S^f_{a_f, e_f}-\mu^{0f}_{a_f, e_f}=\sum_{a_f=1}^{46} {\Pi}_{a_m, e_m, a_f, e_f}\sqrt{S^m_{a_m, e_m} S^f_{a_f, e_f}} \prod_{k=0}^{T(a_m, a_f)-1}\left(\frac{\mu^{0m}_{a_m+k, e_m} \mu^{0f}_{a_f+k, e_f}}{S^m_{a_m+k, e_m} S^f_{a_f+k, e_f}}\right)^{(1 / 2)(\beta(1-\delta))^k}. \end{aligned}

for a_m, a_f \in \{1, \ldots, 46\} and e_m, e_f \in \{1, 2\}, where T(a_m, a_f) = 46 - \max\{a_m, a_f\} + 1. S^m, S^f, \Pi, \mu^{0m}, \mu^{0f} are matrices. I need to solve that system for each of \mu^{0m}_{a_m, e_m}, \mu^{0f}_{a_f, e_f}.

I have a script that solves that system using a generalization of the IPFP algorithm:

using NLsolve
using LinearAlgebra

const Z = 46
const Ī“ = 0.1
const Ī² = 0.95

T(am, af) = Z - max(am, af) + 1

const IDX_ae = CartesianIndices((1:46, 1:2))

function insideproduct_num(am, em, af, ef; Ī¼0m=Ī¼0m, Ī¼0f=Ī¼0f)
    exponent = Ī²*(1 - Ī“)
    power = 1.0
    p = zero(promote_type(eltype(Ī¼0m), eltype(Ī¼0f)))
    for k in 0:T(am, af)-1
        p += 0.5 * power * log(abs.(Ī¼0m[am + k, em] * Ī¼0f[af + k, ef]))
        power *= exponent
    end
    return exp(p)
end

function insideproduct_den(am, em, af, ef; Sm=Sm, Sf=Sf)
    exponent = Ī²*(1 - Ī“)
    power = 1.0
    p = zero(promote_type(eltype(Sm), eltype(Sf)))
    for k in 0:T(am, af)-1
        p += 0.5 * power * log(abs.(Sm[am + k, em] * Sf[af + k, ef]))
        power *= exponent
    end
    return exp(p)
end

function fillauxmat!(auxmat; Ī =Ī , Sm=Sm, Sf=Sf, Ī¼0m=Ī¼0m, Ī¼0f=Ī¼0f)
    for idx in CartesianIndices(auxmat)
        am, em, af, ef = Tuple(idx)
        auxmat[am, em, af, ef] = Ī [am, em, af, ef] * sqrt(Sm[am, em]*Sf[af, ef]) /
                insideproduct_den(am, em, af, ef; Sm=Sm, Sf=Sf)
    end
    return auxmat
end

@views function MMsysM!(res, Ī¼0m; auxmat=auxmat, Sm=Sm, Ī¼0f=Ī¼0f)
    for idx_ae in IDX_ae
        am, em = Tuple(idx_ae)
        s = zero(eltype(Ī¼0m))
        for idx_fi in IDX_ae
            s += auxmat[am, em, idx_fi[1], idx_fi[2]] * insideproduct_num(am, em, idx_fi[1], idx_fi[2]; Ī¼0m=Ī¼0m, Ī¼0f=Ī¼0f)
        end
        res[am, em] = Sm[am, em] - Ī¼0m[am, em] - s
    end
    return res
end

@views function MMsysF!(res, Ī¼0f; auxmat=auxmat, Sf=Sf, Ī¼0m=Ī¼0m)
    for idx_ae in IDX_ae
        af, ef = Tuple(idx_ae)
        s = zero(eltype(Ī¼0f))
        for idx_mi in IDX_ae
            s += auxmat[idx_mi[1], idx_mi[2], af, ef] * insideproduct_num(idx_mi[1], idx_mi[2], af, ef; Ī¼0m=Ī¼0m, Ī¼0f=Ī¼0f)
        end
        res[af, ef] = Sf[af, ef] - Ī¼0f[af, ef] - s
    end
    return res
end

function ipfp(Ī , Sm, Sf; initĪ¼0=initĪ¼0, auxmat=auxmat, tol=1, method=:trust_region, autodiff=:forward, iterations=1_000,
        innerxtol=1, innerftol=1)
    i = 0
    Ī¼0f1 = Sf
    Ī¼0m = initĪ¼0
    d = Inf
    while d > tol && i <= iterations
        fillauxmat!(auxmat; Ī =Ī , Sm=Sm, Sf=Sf, Ī¼0m=Ī¼0m, Ī¼0f=Ī¼0f1)
        solM = NLsolve.nlsolve((res, Ī¼0m) -> MMsysM!(res, Ī¼0m; auxmat=auxmat, Sm=Sm, Ī¼0f=Ī¼0f1), initĪ¼0; method=method, autodiff=autodiff,
                    xtol=innerxtol, ftol=innerftol, iterations=iterations)
        Ī¼0m = solM.zero
        solF = NLsolve.nlsolve((res, Ī¼0f) -> MMsysF!(res, Ī¼0f; auxmat=auxmat, Sf=Sf, Ī¼0m=Ī¼0m), initĪ¼0; method=method, autodiff=autodiff,
                    xtol=innerxtol, ftol=innerftol, iterations=iterations)
        Ī¼0f2 = solF.zero
        d = norm(Ī¼0f1 - Ī¼0f2, Inf)
        Ī¼0f1 = Ī¼0f2
        i += 1
    end
    println("i = ", i, ", d = ", d)
    return abs.(Ī¼0m), abs.(Ī¼0f1)
end

Initialize with

Ī  = 5 .* rand(46, 2, 46, 2)
Sm = 5 .* rand(46, 2)
Sf = 5 .* rand(46, 2)
Ī¼0m = similar(Sm)
Ī¼0f = similar(Sf)
auxmat = similar(Ī )
initĪ¼0 = 5 .* ones(size(Sm))

Now run it

ipfp(Ī , Sm, Sf; initĪ¼0= initĪ¼0)

I would like to speed that up as much as possible. Right now it takes over 2 seconds to solve this toy example. Are there any obvious speed ups Iā€™m missing? Perhaps multithreading?
Note: If I scale the matrices by a factor larger than 5, the algorithm slows down even more.

So in situations like this I always check three things to see what might be slowing the code down:

  • @code_warntype to see if there are problems with identifying variable types
  • @benchmark from the BenchmarkTools.jl package to look at allocations
  • @profview to see where the function is spending most of its time.

When I ran @code_warntype ipfp(Ī , Sm, Sf; initĪ¼0= initĪ¼0) I noticed that auxmat came back as type Any.

I think the issue is you have to pass the variables down through each function. When you write:

function ipfp(Ī , Sm, Sf; initĪ¼0=initĪ¼0, auxmat=auxmat) #plus other keywords
...
end

The function ipfp doesnā€™t know what auxmat is because it hasnā€™t been defined yet. So when you call it without giving it auxmat, it has to go looking for the variable in the global scope.

You could instead write:

function ipfp(Ī , Sm, Sf; initĪ¼0=initĪ¼0, auxmat) #no default value for auxmat
...
end

and call the function as:

ipfp(Ī , Sm, Sf; initĪ¼0=initĪ¼0, auxmat=auxmat)

#This works too
#Julia is cool and will match keywords with variable names
ipfp(Ī , Sm, Sf; initĪ¼0, auxmat) 

This will have to be fixed for the other function definitions as well.

3 Likes

Thank you for that. I wasnā€™t aware of that problem. I still get some Any types. I suspect the problem is that Julia doesnā€™t know the type of the object that returns the zero of the system inside ipfp. Any suggestions for that?

using NLsolve
using LinearAlgebra
using Random

Random.seed!(1293804)

const Z = 46
const Ī“ = 0.1
const Ī² = 0.95

T(am, af) = Z - max(am, af) + 1

const IDX_ae = CartesianIndices((1:46, 1:2))

function insideproduct_num(am, em, af, ef; Ī¼0m, Ī¼0f)
    exponent = Ī²*(1 - Ī“)
    power = 1.0
    p = zero(promote_type(eltype(Ī¼0m), eltype(Ī¼0f)))
    for k in 0:T(am, af)-1
        p += 0.5 * power * log(abs.(Ī¼0m[am + k, em] * Ī¼0f[af + k, ef]))
        power *= exponent
    end
    return exp(p)
end

function insideproduct_den(am, em, af, ef; Sm, Sf)
    exponent = Ī²*(1 - Ī“)
    power = 1.0
    p = zero(promote_type(eltype(Sm), eltype(Sf)))
    for k in 0:T(am, af)-1
        p += 0.5 * power * log(abs.(Sm[am + k, em] * Sf[af + k, ef]))
        power *= exponent
    end
    return exp(p)
end

function fillauxmat!(auxmat; Ī , Sm, Sf)
    for idx in CartesianIndices(auxmat)
        am, em, af, ef = Tuple(idx)
        auxmat[am, em, af, ef] = Ī [am, em, af, ef] * sqrt(Sm[am, em]*Sf[af, ef]) /
                insideproduct_den(am, em, af, ef; Sm=Sm, Sf=Sf)
    end
    return auxmat
end

@views function MMsysM!(res, Ī¼0m; auxmat, Sm, Ī¼0f)
    for idx_ae in IDX_ae
        am, em = Tuple(idx_ae)
        s = zero(eltype(Ī¼0m))
        for idx_fi in IDX_ae
            s += auxmat[am, em, idx_fi[1], idx_fi[2]] * insideproduct_num(am, em, idx_fi[1], idx_fi[2]; Ī¼0m=Ī¼0m, Ī¼0f=Ī¼0f)
        end
        res[am, em] = Sm[am, em] - Ī¼0m[am, em] - s
    end
    return res
end

@views function MMsysF!(res, Ī¼0f; auxmat, Sf, Ī¼0m)
    for idx_ae in IDX_ae
        af, ef = Tuple(idx_ae)
        s = zero(eltype(Ī¼0f))
        for idx_mi in IDX_ae
            s += auxmat[idx_mi[1], idx_mi[2], af, ef] * insideproduct_num(idx_mi[1], idx_mi[2], af, ef; Ī¼0m=Ī¼0m, Ī¼0f=Ī¼0f)
        end
        res[af, ef] = Sf[af, ef] - Ī¼0f[af, ef] - s
    end
    return res
end

function ipfp(Ī , Sm, Sf; initĪ¼0, auxmat, tol=1e-3, method=:trust_region, autodiff=:forward, iterations=1_000,
        innerxtol=1e-3, innerftol=1e-3)
    i = 0
    Ī¼0f1 = Sf
    Ī¼0m = initĪ¼0
    d = Inf
    fillauxmat!(auxmat; Ī =Ī , Sm=Sm, Sf=Sf)
    while d > tol && i <= iterations
        solM = NLsolve.nlsolve((res, Ī¼0m) -> MMsysM!(res, Ī¼0m; auxmat=auxmat, Sm=Sm, Ī¼0f=Ī¼0f1), initĪ¼0; method=method, autodiff=autodiff,
                    xtol=innerxtol, ftol=innerftol, iterations=iterations)
        Ī¼0m = solM.zero
        solF = NLsolve.nlsolve((res, Ī¼0f) -> MMsysF!(res, Ī¼0f; auxmat=auxmat, Sf=Sf, Ī¼0m=Ī¼0m), initĪ¼0; method=method, autodiff=autodiff,
                    xtol=innerxtol, ftol=innerftol, iterations=iterations)
        Ī¼0f2 = solF.zero
        d = norm(Ī¼0f1 - Ī¼0f2, Inf)
        Ī¼0f1 = Ī¼0f2
        i += 1
    end
    println("i = ", i, ", d = ", d)
    return abs.(Ī¼0m), abs.(Ī¼0f1)
end

Here is a slightly better version that doesnā€™t given Any types for the output:

function ipfp(Ī , Sm, Sf; initĪ¼0, auxmat, tol=1, method=:trust_region, autodiff=:forward, iterations=1_000, innerxtol=1, innerftol=1)
    i = 0
    Ī¼0f1 = Sf   #maybe Ī¼0f1 = copy(Sf) instead?
    Ī¼0f2 = Sf
    Ī¼0m = initĪ¼0
    d = Inf
    while d > tol && i <= iterations
        fillauxmat!(auxmat; Ī =Ī , Sm=Sm, Sf=Sf, Ī¼0m=Ī¼0m, Ī¼0f=Ī¼0f1)
        solM = NLsolve.nlsolve((res, Ī¼0m) -> MMsysM!(res, Ī¼0m; auxmat=auxmat, Sm=Sm, Ī¼0f=Ī¼0f1), initĪ¼0; method=method, autodiff=autodiff,
                    xtol=innerxtol, ftol=innerftol, iterations=iterations)
        Ī¼0m .= solM.zero
        
        solF = NLsolve.nlsolve((res, Ī¼0f) -> MMsysF!(res, Ī¼0f; auxmat=auxmat, Sf=Sf, Ī¼0m=Ī¼0m), initĪ¼0; method=method, autodiff=autodiff,
                    xtol=innerxtol, ftol=innerftol, iterations=iterations)
        Ī¼0f2 .= solF.zero
        d = norm(Ī¼0f1 - Ī¼0f2, Inf)
        Ī¼0f1 .= Ī¼0f2
        i += 1
    end
    println("i = ", i, ", d = ", d)
    return abs.(Ī¼0m), abs.(Ī¼0f1)
end

I think there is some confusion when you assign one variable to another. When you write Ī¼0f1 = Sf you have two variables pointing to the same data in memory (updating Ī¼0f1 will also update Sf). If you donā€™t want Sf to change you can write Ī¼0f1 = copy(Sf). Also here I use the .= notation which takes all the values on the right and assigns them to the same positions on the left (assuming the right and left variables are already defined).

Thereā€™s still something weird happening with the @code_warntype output. It looks like a temporary variable is being defined or something?

@code_warntype ipfp(Ī , Sm, Sf; initĪ¼0, auxmat)
...
Locals
  @_7::Any
...

I think it has something to do with the nlsolve call but Iā€™m not sure.

2 Likes

If you profile, whatā€™s taking all of the time? I presume setting up multiple nonlinear solves is not the way to go. Also, generally for large problems using NonlinearSolve and specializing the linear solver towards something for large (sparse) Jacobians is required for anything large.

1 Like

@profview doesnā€™t seem to be working, the pane shows up blank. I have tried using NonlinearSolve, but I canā€™t figure out how to specify the problem :frowning:

While it would be some extra work, Newton-Krylov might help you. Thereā€™s some code for that here.

1 Like

What did you try? Itā€™s almost exactly the same as NLsolve in the high level interface, you just add p (which you can then reuse to remake).

1 Like

This used to be a bug in the VSCode extension but it was fixed recently. Upgrading the extension should suffice, but if not, does the flame graph appear when you re-run the profiling command a second time?

1 Like

I wanted to reuse the work I have done. In particular the functions MMsysM! and MMsysF!.
Those functions are called as MMsysM!(res, Ī¼0m; auxmat, Sm, Ī¼0f), where the kw arguments are basically parameters, however auxmat and Sm donā€™t change for a particular problem.
I did this:

probM = NonlinearProblem{isinplace}((res, Ī¼0m, Ī¼0f) -> MMsysM!(res, Ī¼0m; auxmat, Sm, Ī¼0f), initĪ¼0, Ī¼0f)

I am not sure whether the mutating argument is supposed to be the first one. Also, I am not sure how to change the parameters after probM is defined. I couldnā€™t find that in the documentation.

I didnā€™t know I had to run it twice. I was giving up too easily. How can I export the graph so I can upload it here? (And how do I read interpret it). Here is a link to the html file.

You shouldnā€™t have to run it twice, thatā€™s why itā€™s a bug :wink:

Iā€™m not sure how you did it, but it looks like you managed just fine by linking to the HTML. Itā€™s much more useful than a static screenshot anyway because itā€™s interactive.
And in VSCode itā€™s even nicer because when you control + click the profile it leads you to the relevant line in your code.

https://www.julia-vscode.org/docs/stable/userguide/profiler/

Roughly speaking, you want the tiles to be:

  • mostly blue, because yellow = allocations (your code uses a lot of memory) and red = runtime dispatch (your code struggles to infer types)
  • related to the functions where you expect to be spending time (thatā€™s problem-specific)

At first glance, your profile doesnā€™t scream ā€œunderoptimizedā€ to me, because most of the time is spent in mathematical operations, and the red tiles are related to autodiff (thus sometimes unavoidable). But I donā€™t know the details.

To go further, check out this blog post, which is the best resource I know on optimizing Julia code. It has a profiling section and much more:

https://viralinstruction.com/posts/optimise/#how_to_optimise_julia_code_a_practical_guide

1 Like

It seems the fix for the bug hasnā€™t found its way into a release. I have version 1.47.2. I didnā€™t want to install a pre-release version. Thanks for the resources.

1 Like

It looks like just Ī¼0f is your parameters? Then you can just enclose the others

probM = NonlinearProblem{isinplace}((res, Ī¼0f) -> MMsysM!(res, Ī¼0m; auxmat, Sm, Ī¼0f), initĪ¼0, Ī¼0f)
1 Like

I donā€™t need to specify the arguments? In the example in the documentation f(u, p), u is the input of the function, and p the parameters. That gets translated to probN = NonlinearProblem(f, u0, p).
In my case, Ī¼0m is the input, Ī¼0f the parameters, but res is the array where the residuals of the system are stored. Just to clarify, writing isinplace takes care of all that so that I can specify the function as you did?

Do you have an MWE? Iā€™m trying to run the following:

using NLsolve
using LinearAlgebra
using Random

Random.seed!(1293804)

const Z = 46
const Ī“ = 0.1
const Ī² = 0.95

T(am, af) = Z - max(am, af) + 1

const IDX_ae = CartesianIndices((1:46, 1:2))

function insideproduct_num(am, em, af, ef; Ī¼0m, Ī¼0f)
    exponent = Ī²*(1 - Ī“)
    power = 1.0
    p = zero(promote_type(eltype(Ī¼0m), eltype(Ī¼0f)))
    for k in 0:T(am, af)-1
        p += 0.5 * power * log(abs.(Ī¼0m[am + k, em] * Ī¼0f[af + k, ef]))
        power *= exponent
    end
    return exp(p)
end

function insideproduct_den(am, em, af, ef; Sm, Sf)
    exponent = Ī²*(1 - Ī“)
    power = 1.0
    p = zero(promote_type(eltype(Sm), eltype(Sf)))
    for k in 0:T(am, af)-1
        p += 0.5 * power * log(abs.(Sm[am + k, em] * Sf[af + k, ef]))
        power *= exponent
    end
    return exp(p)
end

function fillauxmat!(auxmat; Ī , Sm, Sf)
    for idx in CartesianIndices(auxmat)
        am, em, af, ef = Tuple(idx)
        auxmat[am, em, af, ef] = Ī [am, em, af, ef] * sqrt(Sm[am, em]*Sf[af, ef]) /
                insideproduct_den(am, em, af, ef; Sm=Sm, Sf=Sf)
    end
    return auxmat
end

@views function MMsysM!(res, Ī¼0m; auxmat, Sm, Ī¼0f)
    for idx_ae in IDX_ae
        am, em = Tuple(idx_ae)
        s = zero(eltype(Ī¼0m))
        for idx_fi in IDX_ae
            s += auxmat[am, em, idx_fi[1], idx_fi[2]] * insideproduct_num(am, em, idx_fi[1], idx_fi[2]; Ī¼0m=Ī¼0m, Ī¼0f=Ī¼0f)
        end
        res[am, em] = Sm[am, em] - Ī¼0m[am, em] - s
    end
    return res
end

@views function MMsysF!(res, Ī¼0f; auxmat, Sf, Ī¼0m)
    for idx_ae in IDX_ae
        af, ef = Tuple(idx_ae)
        s = zero(eltype(Ī¼0f))
        for idx_mi in IDX_ae
            s += auxmat[idx_mi[1], idx_mi[2], af, ef] * insideproduct_num(idx_mi[1], idx_mi[2], af, ef; Ī¼0m=Ī¼0m, Ī¼0f=Ī¼0f)
        end
        res[af, ef] = Sf[af, ef] - Ī¼0f[af, ef] - s
    end
    return res
end

function ipfp(Ī , Sm, Sf; initĪ¼0, auxmat, tol=1e-3, method=:trust_region, autodiff=:forward, iterations=1_000,
        innerxtol=1e-3, innerftol=1e-3)
    i = 0
    Ī¼0f1 = Sf
    Ī¼0m = initĪ¼0
    d = Inf
    fillauxmat!(auxmat; Ī =Ī , Sm=Sm, Sf=Sf)
    while d > tol && i <= iterations
        solM = NLsolve.nlsolve((res, Ī¼0m) -> MMsysM!(res, Ī¼0m; auxmat=auxmat, Sm=Sm, Ī¼0f=Ī¼0f1), initĪ¼0; method=method, autodiff=autodiff,
                    xtol=innerxtol, ftol=innerftol, iterations=iterations)
        Ī¼0m = solM.zero
        solF = NLsolve.nlsolve((res, Ī¼0f) -> MMsysF!(res, Ī¼0f; auxmat=auxmat, Sf=Sf, Ī¼0m=Ī¼0m), initĪ¼0; method=method, autodiff=autodiff,
                    xtol=innerxtol, ftol=innerftol, iterations=iterations)
        Ī¼0f2 = solF.zero
        d = norm(Ī¼0f1 - Ī¼0f2, Inf)
        Ī¼0f1 = Ī¼0f2
        i += 1
    end
    println("i = ", i, ", d = ", d)
    return abs.(Ī¼0m), abs.(Ī¼0f1)
end


Ī  = 5 .* rand(46, 2, 46, 2)
Sm = 5 .* rand(46, 2)
Sf = 5 .* rand(46, 2)
Ī¼0m = similar(Sm)
Ī¼0f = similar(Sf)
auxmat = similar(Ī )
initĪ¼0 = 5 .* ones(size(Sm))

ipfp(Ī , Sm, Sf; initĪ¼0, auxmat) 

but it just runs forever.

1 Like

Sorry about that. This should work:

using NLsolve
using LinearAlgebra
using Random

Random.seed!(1293804)

const Z = 46
const Ī“ = 0.1
const Ī² = 0.95

T(am, af) = Z - max(am, af) + 1

const IDX_ae = CartesianIndices((1:46, 1:2))

function insideproduct_num(am, em, af, ef; Ī¼0m, Ī¼0f)
    exponent = Ī²*(1 - Ī“)
    power = 1.0
    p = zero(promote_type(eltype(Ī¼0m), eltype(Ī¼0f)))
    for k in 0:T(am, af)-1
        p += 0.5 * power * log(abs.(Ī¼0m[am + k, em] * Ī¼0f[af + k, ef]))
        power *= exponent
    end
    return exp(p)
end

function insideproduct_den(am, em, af, ef; Sm, Sf)
    exponent = Ī²*(1 - Ī“)
    power = 1.0
    p = zero(promote_type(eltype(Sm), eltype(Sf)))
    for k in 0:T(am, af)-1
        p += 0.5 * power * log(Sm[am + k, em] * Sf[af + k, ef])
        power *= exponent
    end
    return exp(p)
end

function fillauxmat!(auxmat; Ī =Ī , Sm, Sf)
    for idx in CartesianIndices(auxmat)
        am, em, af, ef = Tuple(idx)
        auxmat[am, em, af, ef] = Ī [am, em, af, ef] * sqrt(Sm[am, em]*Sf[af, ef]) /
                insideproduct_den(am, em, af, ef; Sm=Sm, Sf=Sf)
    end
    return auxmat
end

function MMsysM!(res, Ī¼0m; auxmat, Sm, Ī¼0f)
    for idx_ae in IDX_ae
        am, em = Tuple(idx_ae)
        s = zero(eltype(Ī¼0m))
        for idx_fi in IDX_ae
            s += auxmat[am, em, idx_fi[1], idx_fi[2]] * insideproduct_num(am, em, idx_fi[1], idx_fi[2]; Ī¼0m=Ī¼0m, Ī¼0f=Ī¼0f)
        end
        res[am, em] = Sm[am, em] - Ī¼0m[am, em] - s
    end
    return res
end

function MMsysF!(res, Ī¼0f; auxmat, Sf, Ī¼0m)
    for idx_ae in IDX_ae
        af, ef = Tuple(idx_ae)
        s = zero(eltype(Ī¼0f))
        for idx_mi in IDX_ae
            s += auxmat[idx_mi[1], idx_mi[2], af, ef] * insideproduct_num(idx_mi[1], idx_mi[2], af, ef; Ī¼0m=Ī¼0m, Ī¼0f=Ī¼0f)
        end
        res[af, ef] = Sf[af, ef] - Ī¼0f[af, ef] - s
    end
    return res
end

function ipfp(Ī , Sm, Sf; initĪ¼0, auxmat, tol=1, method=:trust_region, autodiff=:forward, iterations=1_000,
        innerxtol=1, innerftol=1)
    i = 0
    Ī¼0f1 = copy(Sf)
    Ī¼0f2 = copy(Sf)
    Ī¼0m = initĪ¼0
    d = Inf
    fillauxmat!(auxmat; Ī =Ī , Sm=Sm, Sf=Sf)
    while d > tol && i <= iterations
        solM = NLsolve.nlsolve((res, Ī¼0m) -> MMsysM!(res, Ī¼0m; auxmat=auxmat, Sm=Sm, Ī¼0f=Ī¼0f1), initĪ¼0; method=method, autodiff=autodiff,
                    xtol=innerxtol, ftol=innerftol, iterations=iterations)
        Ī¼0m .= solM.zero
        solF = NLsolve.nlsolve((res, Ī¼0f) -> MMsysF!(res, Ī¼0f; auxmat=auxmat, Sf=Sf, Ī¼0m=Ī¼0m), initĪ¼0; method=method, autodiff=autodiff,
                    xtol=innerxtol, ftol=innerftol, iterations=iterations)
        Ī¼0f2 .= solF.zero
        d = norm(Ī¼0f1 - Ī¼0f2, Inf)
        Ī¼0f1 .= Ī¼0f2
        i += 1
    end
    print("iter = ", i, ", tol = ", d)
    return abs.(Ī¼0m), abs.(Ī¼0f1)
end

Ī 1 = rand(46, 2, 46, 2)
Sm1 = rand(46, 2)
Sf1 = rand(46, 2)
Ī¼0m = similar(Sm1)
Ī¼0f = similar(Sf1)
auxmat = zeros(size(Ī 1))
initĪ¼0 = 1 .* ones(size(Sm1))
res1 = similar(initĪ¼0)

@time sol1 = ipfp(Ī 1, Sm1, Sf1; initĪ¼0=initĪ¼0, auxmat=auxmat, tol=1e-4)

Iā€™m trying to test the somewhat full system, but with toy inputs to assess speed.

Two further questions:

  1. I noticed that solving the system slows down for ā€œlargerā€ values of S_m and S_f. For example:
Sm5 = 5 .* Sm1
Sf5 = 5 .* Sf1

Since the system is homogeneous of degree zero in S_m, S_f, \mu^{0m}, \mu^{0f}, is it ok to scale down the system, solve it (presumably quicker) and then scale it back up? I note that all those variables must be in the same units.

  1. The way this generalization of IPFP works is by sequentially solving subsystems until convergence. I thought of having the tolerance of solving those subsystems depend on how close the global algorithm is from convergence. That way, when the global algorithm is far from convergence, the subsystems do not need to be solved to high precision (and so will solve quickly), but when the global algorithm is close to convergence, then subsystems will be solved to higher precision. I modified the ipfp function to:
function ipfp_ad(Ī , Sm, Sf; initĪ¼0, auxmat, tol=1, method=:trust_region, autodiff=:forward, iterations=1_000)
    i = 0
    Ī¼0f1 = copy(Sf)
    Ī¼0f2 = copy(Sf)
    Ī¼0m = initĪ¼0
    d = Inf
    fillauxmat!(auxmat; Ī =Ī , Sm=Sm, Sf=Sf)
    while d > tol && i <= iterations
        if d > 1
            solM = NLsolve.nlsolve((res, Ī¼0m) -> MMsysM!(res, Ī¼0m; auxmat=auxmat, Sm=Sm, Ī¼0f=Ī¼0f1), initĪ¼0; method=method, autodiff=autodiff,
                        xtol=1, ftol=1, iterations=iterations)
            Ī¼0m .= solM.zero
            solF = NLsolve.nlsolve((res, Ī¼0f) -> MMsysF!(res, Ī¼0f; auxmat=auxmat, Sf=Sf, Ī¼0m=Ī¼0m), initĪ¼0; method=method, autodiff=autodiff,
                        xtol=1, ftol=1, iterations=iterations)
            Ī¼0f2 .= solF.zero
            d = norm(Ī¼0f1 - Ī¼0f2, Inf)
            Ī¼0f1 .= Ī¼0f2
        else
            solM = NLsolve.nlsolve((res, Ī¼0m) -> MMsysM!(res, Ī¼0m; auxmat=auxmat, Sm=Sm, Ī¼0f=Ī¼0f1), initĪ¼0; method=method, autodiff=autodiff,
                        xtol=d, ftol=d, iterations=iterations)
            Ī¼0m .= solM.zero
            solF = NLsolve.nlsolve((res, Ī¼0f) -> MMsysF!(res, Ī¼0f; auxmat=auxmat, Sf=Sf, Ī¼0m=Ī¼0m), initĪ¼0; method=method, autodiff=autodiff,
                        xtol=d, ftol=d, iterations=iterations)
            Ī¼0f2 .= solF.zero
            d = norm(Ī¼0f1 - Ī¼0f2, Inf)
            Ī¼0f1 .= Ī¼0f2
        end
        i += 1
    end
    print("iter = ", i, ", tol = ", d)
    return abs.(Ī¼0m), abs.(Ī¼0f1)
end

What do people think of those 2 things?

anyone?