Symbolics + Gröbner basis

I’m trying to find the roots of some multivariable polynomials with parameters. Just to illustrate the problem, suppose I have 2 equations, say:

0 = x^2y + 2xy - a \\ 0 = xy - xy^3

So I try to do the following:

using Symbolics
using Groebner
@variables x, y

eq1 = x^2*y + 2*x*y - a
eq2 = x*y - x*y^3

sys = [eq1, eq2]

Symbolics.groebner_basis(sys)

When I run this code, Symbolics complains that I have not defined a.

Fine, I understand that. But if I add a in the @variables list, how will the algorithm know that I don’t want to solve wrt. a, in other words: that it is x and y I want to find?

[In my case, I have 20 unknowns and 20 equations… if I add what are really parameters as @variables, the function spits out a vector of 168 elements.]

1 Like

The theory of Gröbner bases doesn’t have the concept of variables and constants. If you just wanted to find the roots of some multivariable polynomials with parameters, use symbolic_solve instead and pass the variables you want to solve (which implicitly utilizes Gröbner bases):

using Symbolics
using Groebner
@variables x, y, a

eq1 = x^2*y + 2*x*y - a
eq2 = x*y - x*y^3

sys = [eq1, eq2]

symbolic_solve(sys, [x, y])
4 Likes

I don’t know the specific answer, but as as far as I understand, the answer is basically by specifying a monomial ordering explicitly. See Groebner.jl. You may need to use it directly, instead of Symbolics.jl, not sure.

There’s also another package that depends on Groebner.jl, ParamPunPam.jl. It aims to support specifying parameters, however as far as I understand it assumes the value of a parameter is never algebraic.

1 Like

Hm… with my 20 equations and 20 variables, symbolic_solve is now into its 18 minutes to find the solution. I use a laptop with the top 2024 AMD processor with 12 cores (don’t know if symbolic_solve can utilize the cores, though).

  • How large systems is it possible to solve? Is there a limit?
  • What is a realistic computational time?

Even your example took about 1 minute to solve on my 10-year-old computer. I’m not sure how long it’s going to be, but I know that the time complexity of computing Gröbner bases is very very high (Buchberger’s algorithm), although there are faster algorithms like F5 algorithm.

2 Likes

The run time also greatly depends on the chosen monomial ordering.

1 Like

OK – I had to stop the symbolic_solve() algorithm… it had spent more than 12.5 hours without finding the solution during the night.

One pass of running Symbolics.groebner_basis() takes less than 7 seconds on my laptop for the 20 equations with 20 unknowns + 10 parameters.

My “toy” example of 2 unknowns and 1 parameter took 13 seconds to solve on my laptop using symbolic_solve.


In my case, I am ok with the solution consisting in a sequence of monovariable polynomials that I can solve using a root-finding algorith, and then substituting in the solutions into the next polynomial.

Perhaps symbolic_solve() attempts to actually also solve the polynomials and find explicit solutions?


As mentioned, the Symbolics.groebner_basis() algorithm spits back 168 bases within 7 seconds. I will try to figure out what monomial ordering is.

Hi @BLI ,

To complement a bit on previous comments:

How large systems is it possible to solve? Is there a limit?

This depends on the structure of the system. One rule of thumb is the Bézout bound.

With 20 equations in 20 variables, say each equation is of total degree 2, there are at most 2^20 solutions (d^n). The solving algorithm needs to compute and represent those in some way.

For example, solving random systems in 30 variables of degree 2 is hopeless with Groebner bases.

What is a realistic computational time?

Roughly speaking, incrementing the number of variables by one doubles the running time and memory.

One pass of running Symbolics.groebner_basis() takes less than 7 seconds on my laptop for the 20 equations with 20 unknowns + 10 parameters.

This could mean that this system has some nice structure, and the rule I gave above turns out too pessimistic.

if I add what are really parameters as @variables , the function spits out a vector of 168 elements.

Perhaps symbolic_solve() attempts to actually also solve the polynomials and find explicit solutions?

Yes, I think so. Usually, a GB is not enough for getting a useful representation of solutions (i.e., there may be extra steps to pass from a GB with 168 elements to solutions in some form).

OK – I had to stop the symbolic_solve() algorithm… it had spent more than 12.5 hours without finding the solution during the night.

Since the general solver seems stuck, I would perhaps proceed by analysing the system manually: checking the number of solutions and the dimension of the variety from the GB, obtaining useful relations from the GB to simplify the input system, etc.

Feel free to send the system to me (asdemin_2@edu.hse.ru) or post it here, I can have a look.

2 Likes

OK – I’ll post it here. It is not a revolutionary problem…

using Symbolics
using Groebner

# Variables, i.e., unknowns
@variables V_L, V_g, n_H2O_L, n_H2O_g, n_1_L, n_2_L, n_1_g, n_2_g, n_L, n_g
@variables p, p_H2O, p_1, p_2, y_H2O, y_1, y_2, x_H2O, x_1, x_2
#
# Parameters, i.e., known values
@variables n_H2O, n_1, n_2, V, p_H2O_sat, H_1, H_2, Vt_L, R, T

eq1 = n_H2O_L + n_H2O_g - n_H2O
eq2 = n_1_L + n_1_g - n_1 
eq3 = n_2_L + n_2_g - n_2 
eq4 = n_H2O_g + n_1_g + n_2_g - n_g 
eq5 = n_H2O_L + n_1_L + n_2_L - n_L 
eq6 = p_H2O + p_1 + p_2 - p 
eq7 = p_H2O*V_g - n_H2O_g*R*T 
eq8 = p_1*V_g - n_1_g*R*T 
eq9 = p_2*V_g - n_2_g*R*T 
eq10 = x_H2O*n_L - n_H2O_L 
eq11 = x_1*n_L - n_1_L 
eq12 = x_2*n_L - n_2_L 
eq13 = y_H2O*n_g - n_H2O_g 
eq14 = y_1*n_g - n_1_g 
eq15 = y_2*n_g - n_2_g 
eq16 = p_H2O - x_H2O*p_H2O_sat 
eq17 = p_1 - x_1*H_1 
eq18 = p_2 - x_2*H_2 
eq19 = V_L - n_H2O_L*Vt_L
eq20 = V_L + V_g - V

eqs_VT_flash = [eq1, eq2, eq3, eq4, eq5, eq6, eq7, eq8, eq9, eq10, eq11, eq12, eq13, eq14, eq15, eq16, eq17, eq18, eq19, eq20]

vars_VT_flash = [V_L, V_g, n_H2O_L, n_H2O_g, n_1_L, n_2_L, n_1_g, n_2_g, n_L, n_g, p, p_H2O, p_1, p_2, y_H2O, y_1, y_2, x_H2O, x_1, x_2]

# sol_VT_flash = symbolic_solve(eqs_VT_flash, vars_VT_flash)

sol_VT_flash = Symbolics.groebner_basis(eqs_VT_flash)

Essentially, it is an equilibrium thermodynamics problem, a so-called VT Flash problem.

The “parameters” are known values. What I am looking for is the solution of the unknowns. If I could find what I call n_H2O_L (amount of water in the liquid phase) and n_1_g, n_2_g (amount of components 1 and 2 in the gas phase), it is trivial to find the rest.

1 Like

This should end up down the symbolic linear solver path and not hit the Groebner stuff at all? That’s the real bug here.

1 Like

I’m not claiming a bug. Perhaps it ends up with a 5th order polynomial or higher that cannot be solved to an explicit solution. But it would be great if solve_symbolic could return something in less time. I wouldn’t mind if it was in the form of a monovariable polynomial that needs to be solved numerically (i.e., an implicit expression) + additional polynomials that become monovariable by successively inserting the solution to a prior polynomial.

If it is a bug, or a short-coming, great if my example can inspire further development.

I’m claiming a bug. It should prove it’s linear and short circuit to the symbolic linear solver which is much faster.

Is it linear? For example, eq7 is quadratic in p_H2O, V_g, which are @variables.

Oh I thought that was one of the parameters not differentiated :sweat_smile:. The naming scheme is a bit rough.

With some simplifying assumptions, it can be made linear. But I am trying to avoid those assumptions.

Sorry about that :slight_smile: . n is amount (mol), p is pressure, V is volume, x is liquid phase mole fraction, y is gas phase mole fraction. Subscripts are H2O (water), 1, 2 (two different light gases). “Superscripts” (the second _ are L for liquid, and g for gas. T is absolute temperature, R is gas constant, etc.

What is known at the outset in VT (volume-temperature) “flash” is total volume V, temperature T, and total amounts of the various components (n_H2O, n_1, n_2). I want to know how the components split in the liquid and gas phase.

Finally, H_1, H_2 (Henry’s constants), and p_H2O_sat (water saturation pressure), and Vt_t_L (liquid molar volume) are known functions of temperature.

@BLI thanks for sharing the equations.

Here is how I would get a univariate polynomial in n_H2O_L that depends on parameters:

using Nemo, Groebner

# 1. Define parameters and variables
_r, (n_H2O, n_1, n_2, V, p_H2O_sat, H_1, H_2, Vt_L, R, T) = Nemo.QQ[:n_H2O, :n_1, :n_2, :V, :p_H2O_sat, :H_1, :H_2, :Vt_L, :R, :T]
_R, (V_L, V_g, n_H2O_L, n_H2O_g, n_1_L, n_2_L, n_1_g, n_2_g, n_L, n_g, p, p_H2O, p_1, p_2, y_H2O, y_1, y_2, x_H2O, x_1, x_2) = fraction_field(_r)[:V_L, :V_g, :n_H2O_L, :n_H2O_g, :n_1_L, :n_2_L, :n_1_g, :n_2_g, :n_L, :n_g, :p, :p_H2O, :p_1, :p_2, :y_H2O, :y_1, :y_2, :x_H2O, :x_1, :x_2]

# 2. Define equations
eq1 = n_H2O_L + n_H2O_g - n_H2O
eq2 = n_1_L + n_1_g - n_1 
eq3 = n_2_L + n_2_g - n_2 
eq4 = n_H2O_g + n_1_g + n_2_g - n_g 
eq5 = n_H2O_L + n_1_L + n_2_L - n_L 
eq6 = p_H2O + p_1 + p_2 - p 
eq7 = p_H2O*V_g - n_H2O_g*R*T 
eq8 = p_1*V_g - n_1_g*R*T 
eq9 = p_2*V_g - n_2_g*R*T 
eq10 = x_H2O*n_L - n_H2O_L 
eq11 = x_1*n_L - n_1_L 
eq12 = x_2*n_L - n_2_L 
eq13 = y_H2O*n_g - n_H2O_g 
eq14 = y_1*n_g - n_1_g 
eq15 = y_2*n_g - n_2_g 
eq16 = p_H2O - x_H2O*p_H2O_sat 
eq17 = p_1 - x_1*H_1 
eq18 = p_2 - x_2*H_2 
eq19 = V_L - n_H2O_L*Vt_L
eq20 = V_L + V_g - V

eqs_VT_flash = [eq1, eq2, eq3, eq4, eq5, eq6, eq7, eq8, eq9, eq10, eq11, eq12, eq13, eq14, eq15, eq16, eq17, eq18, eq19, eq20]
vars_VT_flash = [V_L, V_g, n_H2O_L, n_H2O_g, n_1_L, n_2_L, n_1_g, n_2_g, n_L, n_g, p, p_H2O, p_1, p_2, y_H2O, y_1, y_2, x_H2O, x_1, x_2]

# 3. Make these variables the smallest
interesting_vars = [n_H2O_L]

# 4. Compute a GB, allowing division by parameters
using ParamPunPam

gb = paramgb(
    eqs_VT_flash, 
    # vars > interesting_vars > params
    ordering=DegRevLex(setdiff(vars_VT_flash, interesting_vars)) * DegRevLex(interesting_vars)
)

# 5. Inspect. Note: gb[1] is univariate
map(vars, gb)

#= Output
20-element Vector{Vector{AbstractAlgebra.Generic.MPoly{AbstractAlgebra.Generic.FracFieldElem{QQMPolyRingElem}}}}:
 [n_H2O_L]            
 [n_H2O_L, x_2]
 [n_H2O_L, x_1]
 [n_H2O_L, x_H2O]
 [n_H2O_L, y_2]
 [n_H2O_L, y_1]
 [n_H2O_L, y_H2O]
 [n_H2O_L, p_2]
 [n_H2O_L, p_1]
 [n_H2O_L, p_H2O]
 [n_H2O_L, p]
 [n_H2O_L, n_g]
 [n_H2O_L, n_L]
 [n_H2O_L, n_2_g]
 [n_H2O_L, n_1_g]
 [n_H2O_L, n_2_L]
 [n_H2O_L, n_1_L]
 [n_H2O_L, n_H2O_g]
 [V_g, n_H2O_L]
 [V_L, n_H2O_L]
=#

gb[1]

#= Output
n_H2O_L^4 + (n_H2O*p_H2O_sat^2*H_1*Vt_L + n_H2O*p_H2O_sat^2*H_2*Vt_L + n_H2O*p_H2O_sat^2*R*T - 2*n_H2O*p_H2O_sat*H_1*H_2*Vt_L - 2*n_H2O*p_H2O_sat*H_1*R*T - 2*n_H2O*p_H2O_sat*H_2*R*T + 3*n_H2O*H_1*H_2*R*T - n_1*p_H2O_sat^2*R*T + n_1*p_H2O_sat*H_2*R*T - n_2*p_H2O_sat^2*R*T + n_2*p_H2O_sat*H_1*R*T - V*p_H2O_sat^3 + V*p_H2O_sat^2*H_1 + V*p_H2O_sat^2*H_2 - V*p_H2O_sat*H_1*H_2)//(p_H2O_sat^3*Vt_L - p_H2O_sat^2*H_1*Vt_L - p_H2O_sat^2*H_2*Vt_L - p_H2O_sat^2*R*T + p_H2O_sat*H_1*H_2*Vt_L + p_H2O_sat*H_1*R*T + p_H2O_sat*H_2*R*T - H_1*H_2*R*T)*n_H2O_L^3 + (n_H2O^2*p_H2O_sat*H_1*H_2*Vt_L + n_H2O^2*p_H2O_sat*H_1*R*T + n_H2O^2*p_H2O_sat*H_2*R*T - 3*n_H2O^2*H_1*H_2*R*T + n_H2O*n_1*p_H2O_sat^2*R*T - 2*n_H2O*n_1*p_H2O_sat*H_2*R*T + n_H2O*n_2*p_H2O_sat^2*R*T - 2*n_H2O*n_2*p_H2O_sat*H_1*R*T - n_H2O*V*p_H2O_sat^2*H_1 - n_H2O*V*p_H2O_sat^2*H_2 + 2*n_H2O*V*p_H2O_sat*H_1*H_2)//(p_H2O_sat^3*Vt_L - p_H2O_sat^2*H_1*Vt_L - p_H2O_sat^2*H_2*Vt_L - p_H2O_sat^2*R*T + p_H2O_sat*H_1*H_2*Vt_L + p_H2O_sat*H_1*R*T + p_H2O_sat*H_2*R*T - H_1*H_2*R*T)*n_H2O_L^2 + (n_H2O^3*H_1*H_2*R*T + n_H2O^2*n_1*p_H2O_sat*H_2*R*T + n_H2O^2*n_2*p_H2O_sat*H_1*R*T - n_H2O^2*V*p_H2O_sat*H_1*H_2)//(p_H2O_sat^3*Vt_L - p_H2O_sat^2*H_1*Vt_L - p_H2O_sat^2*H_2*Vt_L - p_H2O_sat^2*R*T + p_H2O_sat*H_1*H_2*Vt_L + p_H2O_sat*H_1*R*T + p_H2O_sat*H_2*R*T - H_1*H_2*R*T)*n_H2O_L
=#

Some comments:

  • The ordering of indeterminates is: vars > interesting vars > params, + allowing division by params. This can be modeled differently, but I guess this approach works here.
  • In 3., mark several variables as interesting to uncover relations between them (for example, [n_H2O_L, n_1_g, n_2_g]).
  • I use ParamPunPam.paramgb for computing GB, which @nsajko mentioned.
  • In 4., swap paramgb for groebner if guaranteed result is needed (paramgb is probabilistic).
  • There are 4 solutions, which agrees with the degree of univariate polynomial:
julia> length(quotient_basis(map(identity, gb)))
4
  • From the univariate polynomial, one solution is n_H2O_L = 0 (I wonder if this has physical meaning).
2 Likes

I don’t think the value of vars is defined?

1 Like

It comes from Nemo.jl.

2 Likes

Thanks a lot!!

I’ll need some time to digest it. Seems like a 4th order polynomial in n_H2O_L, right? I’ll have to check how the other variables relate to n_H2O_L.

Of course, a fourth order polynomial admits an explicit solution, so I guess in principle, symbolic_solve could find the solution. But perhaps it is necessary to specify lex ordering, etc. to find the solution.

I tried to scale down the problem to water + one other gas and toss it to symbolic_solve, but still no solution after 7.5 hours. In real life problems involving water electrolysis (as an example), there could probably be water + 4 gases (I included water + 2 gases): hydrogen, oxygen, nitrogen, argon. Possibly more. At some stage, it would probably be better to simply solve it as a numeric problem with iteration on implicit equations.

1 Like