CAS Best Practices

Because there is a pressing in parts of the julia ecosystem to do some light-weight symbolic computations. Many of these computations never see end-users / explicit human interaction. Maybe the end-user just said, hey numerically solve this ODE initial value problem for me, but sure, try to use symbolic jacobians if it helps to generate better (i.e. faster for a given desired precision) machine code.

The correct comparison is not sagemath in “sage mode”, the correct comparison is to writing python code that starts with from sage.all import * and is run with sage -python, i.e. uses python semantics but accesses sage functionality as a library (and unfortunately needs to be executed with the lightly modified CPython version shipped by sage).

Indeed, sage as a language is imho pretty useless – but sage as a python library is super cool.

And now we’re talking seriously:

First, julia beats python performance for code written in the respective languages. (of course e.g. fflas-ffpack bundled with sage will in turn handily beat generic julia linear-algebra code, until someone writes and imports a FFLAS-FFPACK wrapper into their julia project)

Secondly and more importantly, julia as a language is pretty well-adapted to doing this in a more modular fashion: Ubiquitous dynamic multi dispatch is super powerful, and makes the endeavor more tractable from a pure software-engineering and project management and funding perspective (in sagemath, different modules need to be tightly coupled and developed under a single umbrella organization, because they need to know about each other. Julia is more amenable to having separate modules developed independently, and combined in a mix-and-match fashion).

And now we’re talking interesting things! So, representing expression trees as hash-trees is a pretty nifty way of saving some memory, recognizing repeated subexpressions, and occasionally getting exponential speedups. What is your experience with this? What is your “get off my lawn, youngsters”-rant about pitfalls. Which invariants should one build into the hash?

Another very powerful approach is blackbox factorization / Schwartz-Zippel / polynomial-identity-testing. For example, say that you have a symbolic jacobian that involves some exact integer/rational entries and some rational expressions in variables; now you’re interested in it’s inverse. Which entries are zero?

Each entry of the inverse is clearly a multivariate rational function. However, if you tried to expand things into monomials, that would be exponentially sized. No biggy, choose some random finite field as a model, choose random assignments to the variables and evaluate the thing (BLAS over finite field, can be very fast). Schwartz-Zippel guarantees you correctness up to e.g. 2^-1000 or whatever confidence you desire, by repeating the procedure. Oh, you want to factorize all entries and want common factors? Black-box factorization does the job.

So what are your experiences with using Schwartz-Zippel-style as a basis for hash-trees (i.e. use results of evaluations at random points over random fields as hashes)?

Now your expressions may involve things that don’t really work well over finite fields, like square roots, exponentials, logarithms, trigonometrics, etc. You can simply model an unknown function with a random function (e.g. evaluate("weird_function", x1, x2, ...) = hash("weird_function", x1, x2, ...)). This has the problem that you lose identities: log(exp(x)) != x or sqrt(x)^2 != x. This is OK (losing identities loses you simplification power; inventing identities makes your results incorrect), but maybe there are good known finite field models for many standard operations? Maybe there are just wise tricks (e.g. “the old masters say: Don’t bother with galois fields, stick with prime fields and make the prime larger/use more points”, or “the old masters say: Use 16-bit primes with precomputed discrete-log tables”)?

9 Likes