[ANN] Symbolics.jl: A Modern Computer Algebra System for a Modern Language

I don’t mean that physicists never encounter groups, but I think it’s clear that a large majority of the CAS userbase does not directly encounter such objects nor do they need to. Let’s just take a look at the statistics of it. From the reverse dependencies list there’s one package making use of Oscar: SIAN.jl for structural identifiability in ODEs. Since SymbolicUtils.jl already makes use of AbstractAlgebra.jl (yes, we already use some of these tools under the hood), cross-referencing the reverse dependencies list shows the overwhelming vast majority of AbstractAlgebra.jl uses are through SymbolicUtils.jl and Symbolics.jl. The remaining 10% or so reverse dependencies are other CAS tools (Nemo.jl, GrobnerBasis.jl, SymbolicUtils.jl, Symbolics.jl, etc.). So if you just look at the statistics, it seems there’s a major unmet need here.

If you dig around, you see that these tools have been around for years, but if you open up say the RigidBodyDynamics.jl robotics library’s documentation:

https://juliarobotics.org/RigidBodyDynamics.jl/stable/generated/6.%20Symbolics%20using%20SymPy/6.%20Symbolics%20using%20SymPy/

you will see that they are using SymPy. If you look at probabilistic programming libraries like Soss.jl, you will see they are making use of SymPy. So even in Julia there seem to be more people making use of SymPy than all of these other symbolic algebra libraries combined times 10, many times not including it as a dependency to the library to avoid the Python deps, only making the problem worse.

We can :wink: at each other saying “those uncultured fools should know how to make use of Galois fields in order to do symbolic integration!”, but at the end of the day, they don’t care: they just want to do symbolic integration. Trivializing this aspect does not improve the ecosystem nor does it hit the unmet need, which is why more people use SymPy or Mathematica today than anything else. Given how we are using AbstractAlgebra.jl and hopefully soon Oscar.jl, some of Symbolics.jl is just translating “real CAS’s” to the people. The more we don’t have to write, the better. We’re not in competition with the other groups in Julia, rather we’re hoping to expand their reach.

I hope that is all it is. However, as we know with CAS work, sometimes it ends up not so simple.

Many people are worried that GMP is under GPL, along with SuiteSparse, which is why people are trying to actively remove it from Julia Base. There are quite a few issues and PRs, and forum posts and Slack posts and Zulip posts, etc. In fact, it’s discussed so much it’s almost a meme, so yes, a very non-trivial number of people are worried about it, and for good reasons. This is such a big issue that SuiteSparse was made into a separate package in Julia v1.0 and scheduled to be removed to a package, and there is still an ongoing debate about what to do to finish this.

15 Likes

I used Mathematica from ca. 1989-1992; in this period, my employer also tested Macsyma. In 1993, my new employer decided to go for Maple based on price (cost increased linearly for academic licenses with Mathematica, but by square root for Maple). In 1994, I also started to use the LaTeX tool Scientific WorkPlace (SWP), which used Maple as a backend. Later, when Maple started with notebooks, SWP switched to MuPAD, which eventually was bought by MathWorks. Since 2015, my employer has shifted focus somewhat to Python.

Mathematica:

  • Very systematic naming conventions, very good plotting capabilities (for the time)
  • Very good access to books and documentation
  • Focus on physics
  • I wrote a small suite of ODE solvers for a simulation course I taught – worked fine
  • Numerics was so-so in the beginning, giving horribly wrong roots for, say, a 10th order polynomial
  • To me, the multitude of possible programming paradigms was confusing

Maple:

  • Focus on abstract algebra in early books; difficult to read for someone with minor background in math
  • Relatively little documentation made it less accessible than Mathematica
  • Simpler wrt. programming paradigms; relatively simple to develop programs

Some experience with CAS in education

  • Some math teachers mention that some undergraduate engineering students had weak background in math, which CAS could help them overcome
  • Potential problem: with lack of understanding, CAS systems can produce seemingly super results, which may actually be horribly wrong because students don’t understand what is going on.
  • This is the main reason why some colleagues shifted to using Python (or other script languages): they feel that this forces the students to have a better understanding of what is going on

Summary: I love the possibilities with CAS. Success hinges to a large degree on a systematic user interface, good documentation and tutorials.

4 Likes

Jeez, do we have more urgent things that GPL/BSD culture wars? :frowning:
This is such a major distraction, while Wolfram and Mathworks keep most users…

Somehow, GPLed Linux does not worry Microsoft with its WSL, happily shipping it. Who are these people who keep worrying about GPL, and why?


PS. Your discourse settings (for “new users”) prevent me from replying for more than 3 times on the topic. So all I can do is to edit this.

It’s not just a culture war. GPL imposes serious consequences for the precompiled binaries of commercial products like Pumas. Handling that correctly is a legal issue that we can’t just trivialize and roll our eyes at. It’s not a top priority but it is something we do have to worry about because of the consequence it has on our non-open source uses. This whole discussion of course should be a separate thread so we should leave it this.

8 Likes

My two cents:
“Scientist’s CAS” mainly focuses algebraic computation and manipulation of variables/arrays/matrixes in the real number or complex number field.
In other words, we need a framework where normal computational code can be analyzed and transformed symbolically without extra work (the fusion of symbolic code and numerical code). It is more like a symbolic plugin to aid numerical scientific computation.
By contrast, “mathematician’s CAS” pays more attention to mathematical structures and catogories, such as number theory and group theory.

9 Likes

Agreed. I also think @jlapeyre puts it very well in his JuliaCon 2018 talk on Symata.jl:

(I hope John is someone who’s work we can leverage as well!)

5 Likes

On the topic of symbolic integration, I find the following page exciting: Organizing Math as a Rule-based Decision Tree | Rule-based Integration

2 Likes

Yes, we’ve been wanting to use RuBI for some time now. It’s a big undertaking, but seems very promising.

3 Likes

Reading the rules and converting them to Julia can be automated. So the main work would be to implement a converter that converts them from Mathematica to Julia. And (perhaps) to implement a fast algorithm to apply them.

Should be a quite limited scope of work, taking into account that working code in other languages like Java exists and also a large test set exists in a number of languages…

1 Like

TL;DR; I really hope that Symbolics.jl does not come with barriers preventing from seamlessly extending it with other modules, which might feel too mathematical for a novice/naive CAS user.

I spent many years writing papers where a lot of boring, labourous work was about bridging the gaps between scientist’s CASs (SCASs) and mathematician’s CASs (MCASs), as arguably symmetry-based dimension reduction in opimisation was impossible to do on an SCAS, where as MCASs had all that heavy machinery of group theory ready. And now I mosty work (as a mathematician) somewhere where often I need to feed data from commutative algebra MCAS to an SCAS doing e.g. symbolic inregration, or doing numerics on that data. Fortunately, SageMath (on which I work for the last 10 years, sometimes almost fulltime) can help here, but not always.

The point of view that SCAS only should concern itself with real number or complex number fields is naive, as under the hood it does algebraic extensions of rationals, which often manifest itself in the way @rfateman already brought up here. There are also issues of precision, which tend to be swept under the carpet by Matlab funs (“oh, we develop tools for engineers, they only know CPU double precision” view) which are profound.

3 Likes

Since Symbolics.jl is using some of AbstractAlgebra.jl (and thus pieces of Nemo) under the hood, it should interop pretty easily with it since some of the calls are actually interop. This might be something we should add to the goals, since as both Symbolics and Nemo grow there will be more and more we will want to pull from it.

We definitely don’t want that. This is why everything is written with abstract algorithms and generic typing, with a generic metadata system. I think the major difficulty will converting to use algorithms with specializations when possible, but that’s just about finding cases.

As you know from DifferentialEquations.jl, I don’t except the MATLAB view.

5 Likes

Thank you for this excellent post. I strongly believe that ignoring past of any discipline is very bad idea. I never heard about Axiom before, but I see it is worth studding just for it history. Maybe I even learn something of it, for just pleasure of discovery and overcoming obstacles? Time will tell.

Can I use this situation and ask you, what person like me, who is just user of CAS, should study, when he/she want go deeper? I mostly use Mathematica, but to be honest, I don’t like it. It is to chaotic to my taste.

Also, I personaly very found of you mentioning that \textrm{sqrt}(9) = \{ -3, 3 \} is a set not a number. I was struggle with squares roots in complex domain a lot until I realize that Frege points out this problem in the case of real numbers. From this time is some kind of my hobby to tell people that \textrm{sqrt}( x ) is not a number (0 aside), it is a set (I can make exceptions when teaching freshman of different age).

Kamil Ziemian
kziemianfvt@gmail.com

I have invited the core developers of yacas to join this discussion.

4 Likes

I like this summary, while at the same time I was just mesmerised by @rfateman post (I like dig myself into mathematics and CS theory). If I understand @ChrisRackauckas words properly, Symbolic.jl is a by-product of removing bottle-neck to performance in ModelingToolkit.jl, which is good reason to make such project. We are using CAS in most cases for solving some problem, not for advancing field of CAS. From what I understand, this mean that Symbolic.jl was created to solving problems that we can do for ages, not to address open problems in CAS.

Having standard setting CAS created in Julia will be marvelous thing, but is can Symbolic.jl evolve to such thing now? It may be already to time consuming to rework in that direction? Maybe Symbolic.jl should stay robust, but humble project which do old stuff, but do it very good and fast? And another research CAP project in Julia should be open?

I just start to emerged myself in it (wish me luck).

1 Like

Lots of activity here!

  1. I just discovered a thread regarding Julia implementation or translation or merging with a well-regarded rigorous CAS, Axiom. Back in 2014. I don’t know if there are any of the same people involved in this discussion now who were active back then. Frankly, there was too much to read, and I don’t know how it ended up. If Axiom were to be ‘re-implemented’ in Julia instead of its own implementation language (ultimately translated to Lisp, I think), then there would be a bunch of benefits. I suppose there are other benefits to “rolling your own”, as well as pitfalls. There was a brief mention of Maxima – somehow dinged for licensing ??? It is GPL licensed (actually not my favorite, but not worth discussing here, i imagine.) It would be entirely plausible to link to Maxima – there are numerous front ends already.

  2. About lists vs arrays and cache.
    It depends on what you are computing, Traversing a linked list in order is not going to be as fast as indexing into an array, in most circumstances. Is that what you need to do? In a CAS, imagine sequentially adding x, x^2, x^3, x^4 … to a polynomial P. If the polynomial were implemented as a linked list, it would be an easy O(1) operation. If the polynomial were in an array, it would have to be copied over each time, and each addition would be O(size§).
    So the question is, what operations are you supporting with your data structure, and how often do those operations occur in actual computation?
    There are plenty of papers on this.
    Also, there have been studies that show that cache behavior of linked lists is not so bad. A compacting garbage collector may relocate the “next” targets of pointers so they are typically just a few words away … and so they may very well be in the same cache line. As for the data in the list, if they are themselves pointers to (say) bignums, or symbol-table entries or … then those other items may or may not be cache-resident, and it doesn’t matter much if they (that is, the pointers to those items) are in an array or linked list. You have to fetch them where they are. Many computations are almost entirely cache resident. That’s the point, isn’t it?

Basically what you might have learned in class about arrays and cache – may not really be the dominating influence on efficiency.

  1. The major difference in consideration of sparse vs dense polynomials is probably not how they are stored, but the algorithms. They are typically quite different. A “sparse array” style storage structure could be used for a dense polynomial. Multiplying dense polynomials may be done “fast” by Karatsuba style splitting, or by FFT. Multiplying sparse polynomials that way is a bad idea.

I don’t know if it is bad to put several items in one message, making reply harder, but people seem to be able to split things up. I’m just getting my views out there as simply as I can…
fateman@gmail.com

12 Likes

Can I ask if something was broken by this rewrites of internal workings and how much time such rework consume? I’m just interested how this works look a like.

1 Like

Put the words together, we’ll read them. Thanks very much.

2 Likes

There is.

Maxima.risch might be useful to pull from.

Given the experience with SciML/DiffEq, we will almost certainly evolve in a direction of multiple backends, though there will always be a preference for pure Julia because of the installation issues that having 20 competing binaries will have. They will always be add-ons, and not the main show. Maybe extremely useful extensions, but still not something to ship by default. I like Sage a lot and in some ways it’s a guide, but I think requiring Windows users to install a VM is a shame. It’s hard to avoid when binaries are involved though: just relying on one JIT compiler makes efficient code sharing much easier.

That’s an excellent and timely note, thanks.

If it gives some big improvements, that’s fine. We can help you make things propagate downstream. We do a ton of downstream testing in CI for this purpose. Take for example the big terms change:

It may look “major”, but it’s actually relatively minor. All that it really did was change x.op to operation(x), and x.args to arguments(x). And also typeof(x) <: Expression was no longer true and so it has to be istree(x). Most of the PR copies that through, so we did PRs to the downstream libraries as well over the next week to make them all update.

But because of that PR, any representation that has operation and arguments actually works in Symbolics.jl/ModelingToolkit.jl. So Add, Mul, Pow, and Term are what’s in SymbolicUtils.jl today, but adding new term types or changing them around is actually inconsequential to how they work. Notice that the only piece of code that even references the underlying forms is a deprecation warning:

Search for Add: that’s it. So all that really matters to Symbolics.jl is that ex = x^2 + y + xy * cos(x) builds something that represents said equation, and operations(ex) and arguments(ex) exists (these two functions are all you need to convert an equation into a Julia expression, and Terms need them to recursively build expressions).

@shashi is then exploiting this generic implementation to try out new polynomial implementations, using parts of AbstractAlgebra.jl to construct a Poly type with fast overloads where, if it seems your system is polynomial, it may just use AbstractAlgebra.jl (or given results in another thread, FLINT). The entire base could swap to MetaTheory.jl just by defining those two overloads and making @variables x generate an egg graph builder under the hood.

8 Likes

My first experiment with this package:

julia> using Symbolics

julia> @variables x y
(x, y)

julia> x^2-y^2 - (x-y)*(x+y)
x^2 - (y^2) - ((x + y)*(x - y))

julia> simplify(ans)
x^2 - (y^2) - ((x + y)*(x - y))

and I was hoping to get zero.

Thoughts?

4 Likes

I played around, and found:

julia> expr
x^2 - (y^2) - ((x + y)*(x - y))

julia> simplify(expr)
x^2 - (y^2) - ((x + y)*(x - y))

julia> simplify(expr; polynorm = true)
0

I checked simplify documentation, which is part of package SymbolicUtils.jl.

I’m a little puzzled, though. In SymbolicUtils.jl, it says:

simplify(x; rewriter=default_simplifier(),
            threaded=false,
            polynorm=true,
            thread_subtree_cutoff=100)

From this, I’d have thought that polynorm=true was the default, but apparently not so.

2 Likes