CAS Best Practices

But subtypes should not override the defaults (or “fallbacks”) of their supertype “willy nilly”. There should be some kind of consistency with the fallback. e.g.:

julia> typeof(1:4)
UnitRange{Int64}

julia> typeof(1:4) <: AbstractArray
true

julia> (1:4) .+ 5
6:9

julia> collect(1:4) .+ 5
4-element Vector{Int64}:
 6
 7
 8
 9

julia> collect((1:4) .+ 5) == collect(1:4) .+ 5  # consistency
true

If broadcasted addition on UnitRange{Int64} were defined in such a way that the “# consistency” property did not hold, that would be bad for generic programming!

Does that badness apply in this CAS setting? What consistency should exist across rules that are defined on types that have a subtype relationship among them? I am not sure. That’s what the question is.

1 Like

That’s a good question. I would assume that for number types you’d want to have those simplifications by default, and choose on a subset of AbstractFloats to disable them. I could see someone going the other way and having @variables x default to SymbolicReal <: Real and define those rules on SymbolicReal. It’s worth a discussion on the SymbolicUtils.jl repo. Open an issue.

2 Likes

There is no problem whatsoever with defining the square root function (sqrt or \sqrt{}) to be the principal square root. This is the standard approach, as described in the Wikipedia article on square roots and as implemented in Julia, Python, Mathematica, Maple, Maxima, etc. Insisting that we define \sqrt{9} to be \pm3 is a pathway to confusion.

I would think that if you want to show that \sqrt{x^2} and x are unequal as expressions, you’d want to point out the example of x = -3. That way the argument does not depend on a non-standard interpretation of the square root symbol.

2 Likes

From a conventional programming language point-of-view, yes,

From a logico-symbolic point-of-view where
y = sqrt(x) means that x == y^2 , y = sqrt(9) means that 9 == y^2
asking Maple to solve(y^2 = 9, y) gives (3, -3)

A good deal of the back-and-forth is frisson between Numerics and Symbolics (both are math).

We are working toward a best (or a very good) case implementation approach that allows us to glean important benefits that each perspective brings, sometimes one better than the other and vice versa.

My point is exactly that the square root function is, in all common usage, distinct from the solve function. If we insist on conflating the two, how do we interpret an expression like \sqrt{2} + \sqrt{3}? It’s got to be a four-element set, I suppose. Here we’re already verging on chaos, wishing we had a concise way to say “the positive square root” when we want to.

I’m comfortable granting the point that there are many interesting loci of interplay between the demands of symbolic and numeric computation, some of which may require creative thinking about how common concepts need to be treated differently. But I remain skeptical about the specific proposal of abandoning the square root function (which is implied by an assertion like \sqrt{9} = \pm 3).

4 Likes

We are not going to abandon any of these print(names(Base.Math)). I understand how that impression may have been conveyed, inadvertently. What is being considered is how to respect the numerical math functions without abandoning the symbolic/theoretical math functions that may carry the same names in ordinary discourse (hence the tangling of notions). It is my guess that we will find a way to distinguish the desired context (symbolic, numeric, some of each) for any occurrence of the named function – there are multiple avenues for this. The simplest way to distinguish the Julia numerical square_root from the more constructively mathematical symbolic square_root would be use the name from Base.Math as the numerical function and to use its titlecase spelling as the symbolic function sqrt(2) !== Sqrt(2). That leaves out a mixed version (maybe Sqrt is mixed, and purely symbolic is a context, as with let, and do blocks).

see next post by @DNF for one of the more familiar approaches within Julia

Doesn’t that run a bit counter to the general Julia philosophy? Could one instead leverage multiple dispatch with sqrt(2) vs sqrt(Symbolic(2)) (just making up names here)?

Disclaimer: I know nothing about CASs in general, and hardly anything about Symbolics.jl, so this is a question rather than a suggestion.

of course. I just wanted to clarify that these two things (the numerical evaluation and the symbolic handling) are not required to be one thing (although synthesis is good if it work).

The principal square root function is a logically, mathematically, symbolically and numerically perfectly fine function. Using the names sqrt/Sqrt to denote anything else is a mistake. Not because other notions of square root aren’t useful too, but because this convention is already firmly established. If you want a function that returns a set, call it something else (all_square_roots(x) or whatever).

The more general issue here is that computer algebra systems historically are very confused about what it means to solve something. Does solve() return some particular solution, the set of solutions, a set containing some of the solutions, a superset of the solutions (possibly containing spurious solutions), real solutions, complex solutions, or something random depending on context or the phase of the moon? I’ve seen all combinations.

This confusion essentially goes away if you build symbolic sets with well-typed quantifiers into the system. If you want to solve x^2 = y, write down S = \{x : x \in \mathbb{C} \land x^2 = y \} or S = \{x : x \in \mathbb{R} \land x \ge 0 \land x^2 = y \} or something else, depending on what you mean. Now various notions of solving can be reduced to computations on the set S: checking its cardinality, extracting an arbitrary element, iterating over the elements, computing some superset, taking the intersection with another set, etc. To keep things convenient for users, you may want unique_solution(), any_solution() and all_solutions() functions or something similar.

There is actually nothing inherently wrong with simplifying sqrt(x^2) → x or solving x^2 = y → x = sqrt(x) as purely formal operations on formal symbolic expressions. You just have to remember that these are formal operations which don’t commute with evaluation for x \in \mathbb{R}, say. When we do mathematics by hand, we often ignore conditions and check that the result makes sense in the end. The traditional CAS design approach mimics human operation but leaves the final checking to the user. This is clearly very useful in many cases, but it makes it seriously difficult to write correct complex programs that compose operations (where the user can’t realistically check each step). I’m convinced that it’s possible to do better; there’s no need to throw out the classical heuristic symbolic manipulation completely, but it should be a complement to and not a substitute for correct domain-aware rewriting. There are some parallels with floating-point versus interval arithmetic here (including the fact that floating-point solutions much like heuristic symbolic solutions sometimes can be checked a posteriori!).

I also completely agree with Oscar Benjamin that automatic evaluation is a mistake. Simple, unevaluated expressions by default are a win-win: easier to implement, easier and more versatile to use (the user can decide precisely which transformations to apply), and more efficient (symbolic expressions are a terrible data structure for algebraic computation). Instead of performing clever and invariably buggy automatic “canonicalization” on expressions, a far better solution is to convert to specialized data structures internally for computations (e.g. polynomial types for expanding or factoring).

28 Likes

Thank you @Fredrik_Johansson. Your comments are clear and on point.

2 Likes

Yes, that’s why in the tracer it’s planned to not do automatic evaluation, as described in Symbolic sqrt(2) and sin(pi) - #21 by ChrisRackauckas, with dispatches for alternative constructor-based simplifications all being type based.

That’s a good idea.

That is what we do. Sounds like Symbolics.jl is on the right path then, thanks.

6 Likes

One has to be rather careful, and I don’t know what the Julia philosophy dictates,.

As has been pointed out, the notion of principal square root of a non-negative real number is just fine. A very careful CAS might allow you to create and manipulate a “rootof” expression, e.g. RootOf(y^2=x,y) . This represents one of the square roots of x. You might even distinguish the two roots by an index, RootOf(y^2=x,1) and RootOf(y^2=x,2). Though which one is the “positive” or “principal” root is not identifiable. You do know that their sum is 0 and their product is x.

If sqrt() is a name for a program that operates only on non-neg reals, then identifying it as positive real principle square root is certainly ok. It has a certain relationship with the square-root concept in algebra, or in a CAS, but it seems to me that they are not the same.

So some of the comments are, in my view, right on track. Here’s some food for thought.

If z= x^2+y^2-2xy, what do you want for square_root(z)?
It could be x-y, y-x, abs(x-y), RootOf(w=z,w), or a set {x-y,y-x}. There are various arguments for each result. Maxima gives you something of a choice. Leaving it as RootOf, “unevaluated” might seem to be the most correct, but it makes further computation difficult and potentially ambiguous.

Oh, it is also possible sometimes to distinguish RootOfs by intervals. e.g. RootOf(y^2=9,y, inside [-inf,0]) which would presumably be unambiguously -3.

Have fun,.
RJF

5 Likes
3 Likes

Another fundamental question in the interpretation of a “solution” is how a solver handles equations parametrised by symbols e.g.:

julia> @variables a, b, x
(a, b, x)

julia> Symbolics.solve_for([a*x ~ b], [x])
1-element Array{Num,1}:
 b*(a^-1)

Most CAS will return something similar but what if a is zero?

More generally given a situation where solutions exist conditionally depending on the unknowable values of symbolic parameters how do you define formally what the interpretation should be on the returned result? Does it apply for “most” values of a or would the solver only return something if it was known to be correct for all values of a? What if an expression for the solution is correct for a < 0 but no solution exists for a > 0?

Different users want different things in different situations so there isn’t a one size fits all approach. Many users would be extremely disappointed if solve was unable to get b/a from such a simple equation but in some contexts (especially for internal use) something more robust is needed. The next level up is to have a way to indicate any assumptions that were made in deriving the returned result so what you return is

>>> solve([a*x ~ b], [x])
b/a   provided   a != 0

At minimum the primitive solvers on which others are based need to have this capability so that robust higher-level solvers can be built on top. A friendly user API like solve might not do this but for power-users and internal use this is needed.

What most people as “users” want is to get the solution case that corresponds to the generic case for the parameters except when it comes to determining that no solution exists e.g.:

In [5]: linsolve_cond([Eq(x + y, 1), Eq(x + y, a), Eq(x, 1)], [x, y])
Out[5]: 
⎧{(1, 0)}  for a - 1 = 0
⎨                       
⎩   ∅        otherwise

Here the solution is generically (for almost all values of a) the empty set but many people would consider it a bug to return that: why can’t the solver figure out that a=1?

Another approach is to try to consider all possible cases but it doesn’t scale well. I made a sketch of this for SymPy here:

With that you get:

In [3]: linsolve_cond([Eq(a*x, b)], [x])
Out[3]: 
⎧⎧⎛b ⎞⎫           
⎪⎨⎜─,⎟⎬  for a ≠ 0
⎪⎩⎝a ⎠⎭           
⎪                 
⎨  ∅     for b ≠ 0
⎪                 
⎪   1             
⎪  ℂ     otherwise
⎩  

The problem is that the number of cases grows exponentially in the number of symbols:

In [2]: a, b, c, d, e, f, x, y = symbols('a:f, x:y')

In [3]: linsolve_cond([Eq(a*x + b*y, e), Eq(c*x + d*y, f)], [x, y])
Out[3]: 
⎧⎧⎛  b⋅(a⋅f - c⋅e)               ⎞⎫                                                                                                                                                                                                                        
⎪⎪⎜- ───────────── + e           ⎟⎪                                                                                                                                                                                                                        
⎪⎨⎜    a⋅d - b⋅c        a⋅f - c⋅e⎟⎬                                                                                                                                                                                                                        
⎪⎪⎜───────────────────, ─────────⎟⎪                                                                                                for a ≠ 0 ∧ a⋅d - b⋅c ≠ 0                                                                                               
⎪⎩⎝         a           a⋅d - b⋅c⎠⎭                                                                                                                                                                                                                        
⎪                                                                                                                                                                                                                                                          
⎪                ∅                   for (a⋅d - b⋅c = 0 ∧ a ≠ 0 ∧ a⋅f - c⋅e ≠ 0) ∨ (a = 0 ∧ b = 0 ∧ c ≠ 0 ∧ e ≠ 0) ∨ (a = 0 ∧ c = 0 ∧ b ≠ 0 ∧ b⋅f - d⋅e ≠ 0) ∨ (a = 0 ∧ b = 0 ∧ c = 0 ∧ d = 0 ∧ ¬(e = 0 ∧ f = 0)) ∨ (a = 0 ∧ b = 0 ∧ c = 0 ∧ d ≠ 0 ∧ e ≠ 0)
⎪                                                                                                                                                                                                                                                          
⎪    ⎧⎛-b⋅τ₀ + e    ⎞ │       ⎫                                                                                                                                                                                                                            
⎪    ⎨⎜─────────, τ₀⎟ │ τ₀ ∊ ℂ⎬                                                                                            for a⋅d - b⋅c = 0 ∧ a⋅f - c⋅e = 0 ∧ a ≠ 0                                                                                       
⎪    ⎩⎝    a        ⎠ │       ⎭                                                                                                                                                                                                                            
⎪                                                                                                                                                                                                                                                          
⎪        ⎧⎛    e⎞ │       ⎫                                                                                                                                                                                                                                
⎪        ⎨⎜τ₀, ─⎟ │ τ₀ ∊ ℂ⎬                                                                                                for a = 0 ∧ c = 0 ∧ b⋅f - d⋅e = 0 ∧ b ≠ 0                                                                                       
⎪        ⎩⎝    b⎠ │       ⎭                                                                                                                                                                                                                                
⎪                                                                                                                                                                                                                                                          
⎨        ⎧⎛    f⎞ │       ⎫                                                                                                                                                                                                                                
⎪        ⎨⎜τ₀, ─⎟ │ τ₀ ∊ ℂ⎬                                                                                                for a = 0 ∧ b = 0 ∧ c = 0 ∧ e = 0 ∧ d ≠ 0                                                                                       
⎪        ⎩⎝    d⎠ │       ⎭                                                                                                                                                                                                                                
⎪                                                                                                                                                                                                                                                          
⎪    ⎧⎛-d⋅τ₀ + f    ⎞ │       ⎫                                                                                                                                                                                                                            
⎪    ⎨⎜─────────, τ₀⎟ │ τ₀ ∊ ℂ⎬                                                                                                for a = 0 ∧ b = 0 ∧ e = 0 ∧ c ≠ 0                                                                                           
⎪    ⎩⎝    c        ⎠ │       ⎭                                                                                                                                                                                                                            
⎪                                                                                                                                                                                                                                                          
⎪          ⎧⎛    d⋅e   ⎞⎫                                                                                                                                                                                                                                  
⎪          ⎪⎜f - ───   ⎟⎪                                                                                                                                                                                                                                  
⎪          ⎨⎜     b   e⎟⎬                                                                                                                                                                                                                                  
⎪          ⎪⎜───────, ─⎟⎪                                                                                                          for a = 0 ∧ b ≠ 0 ∧ c ≠ 0                                                                                               
⎪          ⎩⎝   c     b⎠⎭                                                                                                                                                                                                                                  
⎪                                                                                                                                                                                                                                                          
⎪                 2                                                                                                                                                                                                                                        
⎪                ℂ                                                                                                     for a = 0 ∧ b = 0 ∧ c = 0 ∧ d = 0 ∧ e = 0 ∧ f = 0                                                                                   
⎩  

Although the number of cases explodes it is possible to implement a solver this way lazily so that if you know what kind of cases you want then they can be extracted efficiently without computing all of the others. Enumerating all cases is impractical but there needs to be some way to know what assumptions were made in deriving the solution.

Attaching validity conditions to what is returned by solvers needs to be an API and implementation consideration from the outset and it needs to start with the lowest level primitive routines. This is not really a feature that can be added to an implementation retrospectively.

Either way the most important thing is to define (and document!) in formal terms what the solver does and how the returned solutions should be interpreted. That needs to be clear from the outset or the implementation will end up choosing based on the phases of the moon as Fredrik says.

8 Likes

The problem cited above about the proliferation of cases is, I think, only one aspect of the issue. True, you get more cases (we called them provisos… XXX provided YYY), and their numbers can grow exponentially as solutions bifurcate previous provisos. We used this to encode material from tables of integrals in a program Tilu.

The adjacent issue I see is to simplify and combine the provisos to a minimal set, and should you be so lucky as to conclude something like

y=1 provided a >0
and
y=1 provided a<=0

then computationally maybe conclude that y=1 [do we have to specify … provided a is real??]

I think it is useful to talk about formal computations in a domain which allow (say) cancelling of polynomials gcd in A/B; on the other hand, certain manipulations that are less well structured and are true “generically” but false “generally” are troublesome. Simple examples of these are sometimes posed as puzzles: Find the flaw in the proof … (where the proof concludes that 1=2, say)
Protesting that a solution has (perhaps) an infinite set of places where it is false contrasts with the certainty that “but the program came up with this solution”. Mathematica does a fair amount of this; I don’t know about SymPy.

There are some tools in Mathematica that are more careful that Solve. I think Reduce and Eliminate are available. I expect that Mathematica’s further manipulation of the results of these programs falls short of the ideal, but I haven’t tried mucking about with their results.

The idea of leaving things “unevaluated” is pretty much inherent in Macsyma/Maxima as a kind of default -when you can’t do an integration, leave the result as “integrate(…)”.
Leaving a result as a program, unevaluated, by default, seems a bit cruel to the user. e.g.

a+b becomes … if a is a number and b is a number then if a+b is a representable number, then add(a,b).
a/b becomes … if b is not 0 or NaN …

and sqrt(x) checks that x is real, non-negative …

It works better if your business is not “computer algebra” but “generating code snippets”.

sorry if I’ve misrepresented what I’ve read but not studied in detail here.
Have fun.

Is there a way to get a handle on this distinction between “generally” and “generically”? I think what you’re referring to is e.g. the rule that
b * a / b == a

with or without (respectively) checking that
b ≠ 0.

See, for example, the wikipedia article about the field of rational functions

for a discussion which allows that one can consider the equivalence class of functions including
R=(f(x)*g(x)) / g(x) as well as f(x), by “cancelling” g(x). Even if g(x) might, for some values of x, evaluate to zero, R is computationally equivalent to f(x).

There’s a much broader scope of expressions that might come out of “solve” but are difficult to justify except if you are willing to avert your gaze and say …
( https://www.youtube.com/watch?v=vJXU7EVXs2A )

It certainly doesn’t feel right to say they are “computationally” equivalent. They indeed express different computations and are not extensionally nor intensionally equal. But I get they are equivalent possibly up to removable singularities.

The article warns about inadvertently losing removable singularities.

The rational function f(x)={\tfrac{x}{x}} is equal to 1 for all x except 0, where there is a removable singularity. The sum, product, or quotient (excepting division by the zero polynomial) of two rational functions is itself a rational function. However, the process of reduction to standard form may inadvertently result in the removal of such singularities unless care is taken. Using the definition of rational functions as equivalence classes gets around this, since x / x is equivalent to 1/1.

My original question was about “general” vs “generic”, but I didn’t understand the dichotomy here. Perhaps it’s that “equivalence up to removable singularity” is more “general” and less “generic” than “equivalence, and strictly so”.

My view is that this rational field division by maybe zero is “small potatoes”. There is probably a mathematical excuse buried in the literature for excusing the cancellation of a potentially-zero denominator. Maybe in “valuation” theory. Not something I am willing to spend time studying.

What is potentially far more annoying are situations like this. We can agree that limit(1/x, x->0) is something. Call it infinity. Maybe there’s +,-, complex infinity. But it is something. Call it q. Now for an ordinary symbol x that might possibly have a value q, we would like x-x to be 0. 2*x to be different from x unless x==0, etc etc. This is not true for inf. So we have a problem. Maybe don’t allow limit? Maybe don’t allow operations on something that might later be assigned the value inf? Come up with an arithmetic that allows infinities and is consistent with the rest of the system? (There are arguments that interval arithmetic can do some of this.)
Anyway, there are a number of different resolutions to this, and they aren’t obvious. E.g. you might say, of course we have to allow limits. Uh, no, that’s one resolution. Maybe not a favorite.
Now for systems that are heavily oriented toward arithmetic, note that there are IEEE float representations for ±inf, NaN (many of them), signed zero, as well as traps, rounding modes, …
and if you want to model them in a computer algebra “system”, it requires thought.
You are aware, but don’t obsess over it, but you know you need to think about arithmetic even if it doesn’t overflow, underflow, etc… You can still make grevious mistakes by subtraction of two nearly-equal quantities, which elevates round-off and truncation noise to prominence.

I would not rely on the usual “mathematics” sources to come up with a unified theory of all of these, including objects infinities, undefineds, indeterminate, NaNs. Nor would I expect this to come out as a free consequence of computer language types, compiling, multiple-dispatch-object-oriented-functional-parallel-cloud-AI-whatever. Ultimately, I do expect that a computer system could mimic or improve on what human mathematicians try to do, just it is not a “free” consequence of something not specifically addressing these issues. For people who are concerned with “reliable computation” that may sort of be adjacent to the symbolic computation in pursuit of better numerics, there is the activity in interval arithmetic which may be helpful. A place to start may be this …
https://standards.ieee.org/standard/1788_1-2017.html

RJF

3 Likes

I am certainly underthinking this. But the problem seems to be an absence of types. Or perhaps “domains”. x - x == 0 is a perfect rule for Int but [subtly] broken for Float64. The kind of x for which the rule holds, that kind should have a name. Maybe the name is x - x == 0 holds”, but I suspect there’s a better name.

And then q is not that kind of thing. It is a different thing, and as such, the rule x - x == 0 does not apply to it.

The types here are geared toward safety / soundness. That’s in contrast to the types in Julia, which arguably exist primarily to direct dispatch. I’m sure it’s not easy to figure out those types, but it does seem crucial to think in those terms.