Hi!, im trying to solve for x ∈ R^n
the following equation:
A*x .* x .+ x .-1 .= 0
or in matrix form:
Diagonal(A*x)*x + x - ones(n) = zeros(n)
One way to solve this is by using succesive substitution with the following recurrent relation:
xnew .= 1./(1 .+ A*xold)
and starting from ones(n)
seems to converge (with damping in each iteration), but the amount of iterations depends on the values of A. for example:
A1 = [
0.0 21.0568 0.0 22.5014
42.1135 0.0 22.5014 0.0
0.0 16.876 0.0 21.7082
33.752 0.0 21.7082 0.0]
x1 = [ 0.32360813340172656
0.04447193335303914
0.34921520186098937
0.05127345196546654]
needs more iterations to converge than
A2 = [
0.0 0.0307809 0.0 0.0303566
0.0615618 0.0 0.0303566 0.0
0.0 0.0227675 0.0 0.0283367
0.0455349 0.0 0.0283367 0.0]
x2 = [ 0.9463622729313296
0.9197583685871784
0.9547270052148209
0.9344516381745205]
my question is mainly if there are alternative ways to solve this problem. Any help in providing better initial points, new methods or exact solutions (if possible) would be greatly appreciated.
Have you tried Newton’s method? ForwardDiff can easily find the Jacobian of this function.
1 Like
One problem i have with just throwing a solver is that the routine that solves that problem needs to be differenciated against. (up to 4 times in some cases) so any additional ForwardDiff.derivative puts additional pressure over the higher level functions that uses the core routine. (NLSolvers.jl seems to support being inside a function that needs to be differenciated in some cases)
Also, there are some instability problems with the approach. there are at least two solutions to that problem, and we are only insterested on the positive one, when using a Jacobian based approach, the solution keeps jumping around for some values of A.
If you are differentiating a function of the result of a nonlinear solve, you should probably implement the vector–Jacobian “pullback” analytically via an adjoint method. The good news is that this is extremely efficient — the vJp for the nonlinear result x
can be computed by a single Jacobian solve from your nonlinear system, equivalent to a single step of Newton’s method. See section 3 of these course notes, or the slides from lecture 4 of our matrix calculus course.
If you don’t do a manual adjoint vJp for x
, then the danger is an AD tool will almost certainly try to backpropagate through all the steps of your nonlinear-solver iteration, which is expensive in both time and memory (it will need to save all the steps). Basically, AD will try to exactly differentiate the error in your nonlinear solver iteration, which wastes a huge amount of effort.
(Note also that it is quite easy to write an analytical formula for the Jacobian of your nonlinear equations in this case — this is a simple application of the techniques in our matrix-calculus course. This is especially useful if your matrix A has some special structure, e.g. if it is sparse.)
2 Likes
oh, sorry, yes, i meant Diagonal(A*x)*x
In particular, for your function
f(x, A) = (A*x) .* x .+ x .- 1
the analytical Jacobian matrix with respect to x
is
f′(x, A) = Diagonal(A*x .+ 1) + Diagonal(x) * A
and for A = rand(4,4)
and an initial x = rand(4)
, a Newton iteration converges quite quickly:
julia> for i = 1:10
@show norm(f(x, A))
x = x - f′(x, A) \ f(x, A)
end
norm(f(x, A)) = 1.080025816257548
norm(f(x, A)) = 0.2505287834319997
norm(f(x, A)) = 0.006815320333271257
norm(f(x, A)) = 6.5781805708859994e-6
norm(f(x, A)) = 6.918787974963443e-12
norm(f(x, A)) = 0.0
norm(f(x, A)) = 0.0
norm(f(x, A)) = 0.0
norm(f(x, A)) = 0.0
norm(f(x, A)) = 0.0
On your original A1
and A2
examples above, it converges to machine precision in 7 and 3 iterations, respectively, starting from x = ones(4)
.
10 Likes