CAS Best Practices

AHA, thanks for the clarification. I did not think that opt-in was related to the use of defgeneric, which is optional. I think of Common Lisp as a multi-paradigmatic language. You have many choices. Structures, Arrays, Lists, hashtables, … you also have functions, methods, macros, compiler macros, etc.

I think the point that is being made is that at compile-time the Julia compiler may resolve a generic function call down to a specific subroutine call, a call in which type-testing is omitted, because the types can be deduced at compile time. For example, a call to add(a,b), where add is “generic” can be compiled to (say) a floating-point add instruction, if the compiler can “prove” that a and b are double-float numbers. That’s nice, and I suppose there is a choice to be made in a language design as to whether the type of a, b, etc is dynamic (unknown until runtime) or instantiated and immutable at compile time.

For a CAS, typically the type of almost every datum in some general representation is a structure of ‘type’ ‘algebraic expression’ – so generic operator dispatch might not be particularly important. A typical operation “dispatch” in Maxima (written in Lisp) might be oh, I see the list (sin (+ pi b)) so I will look at the simplification program associated with “sin” to work on this. It would be possible to define each algebraic structure as a different type depending on its “head” and thus the generic simplification program could have a type-dispatch that goes off to sin, cos, tan, plus, times, etc… But the type then would rarely be known until run-time, so what’s the point?
Oh, there are other types of algebraic representations … even in Lisp it might be a list or a vector, or perhaps an atom (symbol or built-in number type). Once you know an expression is a particular kind of number, or are otherwise “down in the weeds” I think that calling ordinary functions is not such a burden – and they can be in-lined by the compiler sometimes. Like floating add.

RJF

I think you’re imputing a more lofty claim than the claim that the Julia CAS developers are making. I’m not involved in the Julia CAS work, and I’ve only been casually following these threads, but my impression is that the primary goal is, to paraphrase, “we want the equivalent of SymPy, but faster”. The Julia CAS developers probably do have aspirations beyond that, but I don’t think anyone is claiming “we will build the best CAS that has ever been built”.

6 Likes

I quote

SymPy� is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible.� SymPy� is written entirely in Python.

Given the goal as you interpret it, it seems that time might be better spent on writing a program STJ , a SymPy to Julia translator, which then would include all the handiwork of the SymPy team, but might run umpteen times faster. I don’t know what the coding habits are for SymPy, and whether you might as well try to compile all of Python. It need not run as fast as pure Julia, but who knows.

In my experience, important tools should be applied:
First to measure performance by benchmarking and next by running a profiling program to find bottlenecks. For example, if it turns out that your most important SymPy program spends 95% of its time in a library routine from Gnu MPFR, then improving the speed of Python passing messages, calling functions, etc will not matter.

When someone says a program is 10,000x faster, my first thought is “doing what?”

What if the SymPy code was only, say 2X slower than running an equivalent algorithm in Julia, in the same problem-solving context, same computer system, … . Would you regret having spent the time reprogramming it? Of course the same question could be posed to the SymPy people — how about just wrapping up some existing CAS? As done by the SageMath people, who appear to have wrapped some version of Maxima, as an alternative to SymPy. I would not be surprised if Maxima is faster than SymPy on some tasks, though of course you could find a really slow lisp and make some slow uncompiled copy of Maxima (who knows) 10,000x slower .

RJF

This is a FAQ — you don’t get any performance boost by “transpiling” from another language, because the whole performance benefit of Julia derives from the front-end, not the back-end.

SymEngine is reportedly hundreds of times faster than SymPy for many tasks, and Symbolics.jl is reportedly matching SymEngine already on many benchmarks (but seems easier to extend and integrate with other Julia projects than SymEngine’s C++ codebase).

To quote one of the authors of SymPy and SymEngine, Python is indeed too slow, so it is hard for SymPy to compete in terms of speed. The idea that a new implementation in a compiled language can do much better hardly seems outlandish.

21 Likes

I spent 1,000+ hours over the past 10+ years using CAS (commercial & open source) to solve symbolic problems. I exchanged dozens of emails/phone calls w/ Mathematica’s tech support engineers.

My main concerns w/ current CAS is they often:
A. give more complicated solutions than necessary (eg using special functions when not needed)
B. are unable to solve problems w/ known closed form solutions
For my symbolic problems I don’t care that much about speed.
(For numeric problems I care a lot…)

I’m excited about the potential of symbolics.jl because:

  1. A very capable group of people are designing a new CAS, in a great ecosystem.
    They have 2nd mover advantage and can build on the latest literature/tech and can try to avoid as many past mistakes as possible.
  2. I believe Julia is a great language for CAS (type system, MD …)

The main reason (above) that I’m hopeful about symbolics.jl is 1 (2nd mover advantage) and not 2 (Julia language).
Even if symbolics.jl wasn’t written in Julia, I’d be excited about a new CAS by a strong group in a great ecosystem that aims to build on previous work.
@rfateman I really hope the symbolics.jl team can get constructive feedback on CAS design from you and people like you.

10 Likes

7 posts were split to a new topic: CAS benchmarks (Symbolics.jl and Maxima)

FWIW this Discourse forum supports Markdown, single backticks for inline monospace and triple backticks for

```
blocks
```

and dollar signs for \LaTeX. I realize some people prefer not to use it, just figured I’d mention it.

6 Likes

It might be worthwhile to remember that this is not about millisecond versus microsecond: Some years ago I was deriving some analytical expressions resulting from products of symbolic matrices in Matlab (mupad). The process was started in the morning, and finally by the morning the next day it choked and stopped with an error. I guess that would be the sort of problem for which one would like to see a memory-conservative and fast symbolic code.

6 Likes

I want to just say that it seems like the original intention of this thread was to solicit concrete advice about designing and implementing a CAS from people with experience. It has turned into a somewhat tense back and forth about whether that is worth doing and whether the Julia language is better or worse language for implementing a CAS than those used by existing projects. Putting my moderator hat on, that isn’t a great direction for the conversation: some people are going to be on one side, some on the other; debating seems unlikely to resolve the disagreement and likely to get heated. I don’t want to close the thread, but please consider taking it back to the original intention of concrete advice on how to design and implement a CAS, not trying to convince a group of people who want to try something that they are foolish to do so. It may or may not be worthwhile, but the only way to find out is for them to try it and see how it goes. Other people are entitled to think that’s a waste of time and energy, but it’s their time and energy to choose how to spend.

32 Likes

Just looking over this note again, I see you refer to Lample’s paper on neural networks and symbolic integration, a paper which is reviewed here (trying to give them the benefit of the doubt, but critical)

but I would be more skeptical. How much can you do when the only constants you are allowed are the integers between -5 and 5, you sometimes give wrong answers, and (to my mind, worse) make up some benchmark based on your synthetic test set and claim superior results.
It may be possible to find a legitimate use of machine learning (say, to choose parameters in heuristic search of some sort in numerical or even symbolic computation), but I view this particular paper as a misuse of ML and a bogus attempt to justify it. I’d look elsewhere for a boost for ML + symbolic algebra, though it strikes me as implausible from fundamental concerns. Like ML can’t really do arithmetic, it seems.

.

3 Likes

Some food for thought, :sunglasses: Example Code using Nemo, and using Hecke Computer Algebra for your consideration

####
# Example Code using Nemo, and using Hecke Computer Algebra and Number Theory Packages for Julia - 1705.06134.pdf @@ https://arxiv.org/pdf/1705.06134.pdf
#
# NOTE Don't whack yourself lol, Buy some insurance by First Creating a New Experimental TEST Environment
# (v1.4-EXP3) pkg> activate v1.4-EXP3
# add Nemo     # Nemo  v0.18.4
# add Hecke    # Hecke v0.8.6
# Find and Edit Folder: .julia/packages/Nemo/QMPxi/test/  File: Benchmark-test.jl
using Nemo
using Hecke

using Test
# Test Summary:       | Pass  Total 4
@testset "Benchmark.fateman..." begin
   R, x = PolynomialRing(ZZ, "x")
   S, y = PolynomialRing(R, "y")
   T, z = PolynomialRing(S, "z")
   U, t = PolynomialRing(T, "t")

   p = (x + y + z + t + 1)^20

   q = p*(p + 1)

   @test length(q) == 41
end

@testset "Benchmark.pearce..." begin
   R, x = PolynomialRing(ZZ, "x")
   S, y = PolynomialRing(R, "y")
   T, z = PolynomialRing(S, "z")
   U, t = PolynomialRing(T, "t")
   V, u = PolynomialRing(U, "u")

   f = (x + y + 2z^2 + 3t^3 + 5u^5 + 1)^10
   g = (u + t + 2z^2 + 3y^3 + 5x^5 + 1)^10

   q = f*g

   @test length(q) == 61
end

@testset "Benchmark.resultant..." begin
   R, x = FlintFiniteField(7, 11, "x")
   S, y = PolynomialRing(R, "y")
   T = ResidueRing(S, y^3 + 3x*y + 1)
   U, z = PolynomialRing(T, "z")

   f = (3y^2 + y + x)*z^2 + ((x + 2)*y^2 + x + 1)*z + 4x*y + 3
   g = (7y^2 - y + 2x + 7)*z^2 + (3y^2 + 4x + 1)*z + (2x + 1)*y + 1

   s = f^12
   t = (s + g)^12

   r = resultant(s, t)

   @test r == (x^10+4*x^8+6*x^7+3*x^6+4*x^5+x^4+6*x^3+5*x^2+x)*y^2+(5*x^10+x^8+4*x^7+3*x^5+5*x^4+3*x^3+x^2+x+6)*y+(2*x^10+6*x^9+5*x^8+5*x^7+x^6+6*x^5+5*x^4+4*x^3+x+3)
end

@testset "Benchmark.poly_nf_elem..." begin
   R, x = CyclotomicField(20, "x")
   S, y = PolynomialRing(R, "y")

   f = (3x^7 + x^4 - 3x + 1)*y^3 + (2x^6-x^5+4x^4-x^3+x^2-1)*y +(-3x^7+2x^6-x^5+3x^3-2x^2+x)

   @test f^300 == f^100*f^200
end
# Test Summary:       | Pass  Total 4
# Test Summary:       | Pass  Total 4

# === Original source of my motivation / inspiraton
# arbitrary-precision interval arithmetic using the midpoint-radius representation, also known as ball arithmetic.
# https://discourse.julialang.org/t/calculation-pi-pi-pi-in-hexadecimal-in-high-precision/51785/60?u=marc.cox
# dpsanders ** # Committer
# @Marc.Cox I believe it’s using Fredrik Johansson’s arb software. He has a nice paper describing how that works.

## File: CITATION.bib
## Citing Hecke
# If your research depends on computations done with Hecke, please consider giving us a formal citation:
# - Claus Fieker, William Hart, Tommy Hofmann and Fredrik Johansson, [Nemo/Hecke: Computer Algebra and Number Theory Packages
#   for the Julia Programming Language](http://doi.acm.org/10.1145/3087604.3087611). In: Proceedings of ISSAC '17, pages 157–164, New York, NY, USA, 2017. ACM.
#
## NOTE Documentation Yay !
# Documentation for Nemo.jl can be found here: https://nemocas.github.io/Nemo.jl/latest/index.html
# Documentation for Hecke.jl can be found here: https://www.thofma.com/Hecke.jl/latest/
2 Likes

Hi Petr @PetrKryslUCSD

Ugh :cold_face:, I really sympathize w you about " … by the morning the next day it choked :upside_down_face: and stopped with an :nauseated_face: error."

And you have always been uber helpful :innocent:, so I wanted to throw out there for your consideration “… arbitrary-precision interval arithmetic using the midpoint-radius representation, also known as ball arithmetic.” using Nemo and/or IntervalArithmetic.jl described better by @dpsanders here @@ Calculation Pi^Pi^Pi in hexadecimal in High precision - #60 by dpsanders

As for my take on arbitrary-precision interval arithmetic (so far)
is that it is deterministic (no “next day bombs” - UGH!),
AND rigorous (?almost?) to the extent of a mathematical proof of correctness. :innocent:

So I’ve been using Nemo instead of SpecialFunctions for a while when using ApproxFun,
but my particular use of Nemo code might be only obliquely related to your needs,
so I whipped up some quick code using Nemo and using Hecke for you below
that just might be better suited to your purposes (of iteration), memory-conservative,
and fast symbolic code ?

NOTE WARNING and TIP >> To perform the SAME task using floating point arithmetic would require an a priori error analysis. … In contrast Arb’s ball arithmetic allows the error analysis to be carried along with the computation. … << NOTE WARNING and TIP

NOTE TIP >> using Nemo allows for fast development, while at the same time maintaining Guaranteed results. << NOTE TIP

####
####

# Example Code using Nemo using Hecke Computer Algebra / CAS and Number Theory Packages for Julia - 1705.06134.pdf @@ https://arxiv.org/pdf/1705.06134.pdf
#
# NOTE Don't whack yourself lol, Buy some insurance by First Creating a New Experimental TEST Environment
# (v1.4-EXP3) pkg> activate v1.4-EXP3
# add Nemo     # Nemo  v0.18.4
# add Hecke    # Hecke v0.8.6
using Nemo
using Hecke

## Marc adding ...
# Nemo/Hecke: >> Computer Algebra (CAS) and Number Theory Packages for the Julia Programming Language <<
# Source: https://arxiv.org/pdf/1705.06134.pdf
#
# 5.2  Arb: arbitrary precision ball arithmetic Computing over R and C requires using some approximate model for these fields. The most common model is floating-point arithmetic.
# NOTE >> However, for many computer algebra / CAS algorithms,
# the error analysis necessary to guarantee correct results with floating-point arithmetic becomes impractical. Interval arithmetic solves this problem by effectively making error analysis automatic << NOTE
#
# Many problems can be solved using lazy evaluation: the user can try a computation with some tentative precision P and restart with precision 2*P if that fails. The precision can be set optimally when a good estimate for the minimal required P available; that is, the intervals can be used as if they were plain floating-point numbers,and NOTE >> the Automatic error bounds simply provide a Certificate. << NOTE
#
# 6.2  The use of interval arithmetic @@ https://arxiv.org/pdf/1705.06134.pdf
# Finally we check whether diam(B(i)α) ≤2−p.
# NOTE If not, we Increase the Working Precision P and repeat.
# NOTE >> When finished, the midpoints of the ballsB(i)α are approximationsˆσi(α)
# with the desired property. << NOTE
# The following Hecke code illustrates this.
#
QQx, x = PolynomialRing(QQ, "x") # NIX FlintQQ
# function NumberField(f::fmpq_poly, s::AbstractString; cached::Bool = true, check::Bool = true)
flintfun_f = x^3 + 3*x + 1
K, gen_po_a = Nemo.NumberField(flintfun_f, "pgen_a",cached=true,check=true)
alpha = gen_po_a^2 + gen_po_a + 1
# WARNING Original paper seems to have a Code BUG ish ? IOW appeared initially IMPOSSIBLE w global int_target_precision_p = 128
#  per julia> log(1/typemax(Int64))/log(2) == -63.0
#      julia> 2^64 == 0
# BUG w Original -int_target_precision_p = 128
global int_target_precision_p = 16
global int_initial_precision_p_prime = 4
println("")
println("Increase the Working Precision P and repeat UNTIL all([ Hecke.radiuslttwopower(u, -int_target_precision_p) for u in z])")
while true
 CCy, y = PolynomialRing(AcbField(int_initial_precision_p_prime), "y")
 g = CCy(x^3 + 3*x + 1)
 rts = roots(g) # Attempts to isolate the complex roots of the complex polynomial xx by iteratively refining balls in which they lie. .. Q: ? Related to https://en.wikipedia.org/wiki/Fundamental_theorem_of_algebra , https://en.wikipedia.org/wiki/Weierstrass_factorization_theorem ?
 z = map(x^2 + x + 1, rts)
 # 3-element Array{acb,1}: # .. # [-1.8908090 +/- 3.32e-8] + i*[-2.3196168 +/- 2.12e-8]
 #
 # https://fredrikj.net/arb/mag.html#c.mag_cmp_2exp_si
   #  int mag_cmp_2exp_si(const mag_t x, slong y)
   #   Returns negative, zero, or positive, depending on whether x is smaller, equal, or larger than 2y
 if all( [ Hecke.radiuslttwopower(u, -1*int_target_precision_p) for u in z ] )
   # Possible BUG w Original code where -int_target_precision_p = 128 likely due to 2^128 Julia overflow/underflow from 64Bit numbers per julia> 2^128 == 0
   # TODO Try again with ALL numbers as arbitrary high precision Arbs ?
   # TODO IOW ALL Arbs, and NO 32Bit or 64Bit numbers
   break
 else
  global int_initial_precision_p_prime = 2*int_initial_precision_p_prime
  println("Increase/DOUBLE the Working Precision P int_initial_precision_p_prime:=",int_initial_precision_p_prime," and repeat UNTIL all([ Hecke.radiuslttwopower(u, -int_target_precision_p) for u in z])")
 end
end

# NOTE WARNING and TIP >> To perform the SAME task using floating point arithmetic would require an a priori error analysis. .. In contrast Arb’s ball arithmetic allows the error analysis to be carried along with the computation. .. << NOTE WARNING and TIP
# NOTE TIP >> using Nemo allows for fast development, while at the same time maintaining Guaranteed results. << NOTE TIP 

println("##125 !! DONE EOJ Increased the Working Precision P_prime and repeated UNTIL all([ Hecke.radiuslttwopower(u, -int_target_precision_p) for u in z]) is >=", 2.0^(-1*int_target_precision_p))

# === Original source of my motivation / inspiraton
# arbitrary-precision interval arithmetic using the midpoint-radius representation, also known as ball arithmetic.
# https://discourse.julialang.org/t/calculation-pi-pi-pi-in-hexadecimal-in-high-precision/51785/60?u=marc.cox
# dpsanders ** # Committer
# @Marc.Cox I believe it’s using Fredrik Johansson’s arb software. He has a nice paper describing how that works.

## File: CITATION.bib
## Citing Hecke
# If your research depends on computations done with Hecke, please consider giving us a formal citation:
# - Claus Fieker, William Hart, Tommy Hofmann and Fredrik Johansson, [Nemo/Hecke: Computer Algebra and Number Theory Packages
#   for the Julia Programming Language](http://doi.acm.org/10.1145/3087604.3087611). In: Proceedings of ISSAC '17, pages 157–164, New York, NY, USA, 2017. ACM.
#
## NOTE Documentation Yay !
# Documentation for Nemo.jl can be found here: https://nemocas.github.io/Nemo.jl/latest/index.html
# Documentation for Hecke.jl can be found here: https://www.thofma.com/Hecke.jl/latest/
1 Like

Because there is a pressing in parts of the julia ecosystem to do some light-weight symbolic computations. Many of these computations never see end-users / explicit human interaction. Maybe the end-user just said, hey numerically solve this ODE initial value problem for me, but sure, try to use symbolic jacobians if it helps to generate better (i.e. faster for a given desired precision) machine code.

The correct comparison is not sagemath in “sage mode”, the correct comparison is to writing python code that starts with from sage.all import * and is run with sage -python, i.e. uses python semantics but accesses sage functionality as a library (and unfortunately needs to be executed with the lightly modified CPython version shipped by sage).

Indeed, sage as a language is imho pretty useless – but sage as a python library is super cool.

And now we’re talking seriously:

First, julia beats python performance for code written in the respective languages. (of course e.g. fflas-ffpack bundled with sage will in turn handily beat generic julia linear-algebra code, until someone writes and imports a FFLAS-FFPACK wrapper into their julia project)

Secondly and more importantly, julia as a language is pretty well-adapted to doing this in a more modular fashion: Ubiquitous dynamic multi dispatch is super powerful, and makes the endeavor more tractable from a pure software-engineering and project management and funding perspective (in sagemath, different modules need to be tightly coupled and developed under a single umbrella organization, because they need to know about each other. Julia is more amenable to having separate modules developed independently, and combined in a mix-and-match fashion).

And now we’re talking interesting things! So, representing expression trees as hash-trees is a pretty nifty way of saving some memory, recognizing repeated subexpressions, and occasionally getting exponential speedups. What is your experience with this? What is your “get off my lawn, youngsters”-rant about pitfalls. Which invariants should one build into the hash?

Another very powerful approach is blackbox factorization / Schwartz-Zippel / polynomial-identity-testing. For example, say that you have a symbolic jacobian that involves some exact integer/rational entries and some rational expressions in variables; now you’re interested in it’s inverse. Which entries are zero?

Each entry of the inverse is clearly a multivariate rational function. However, if you tried to expand things into monomials, that would be exponentially sized. No biggy, choose some random finite field as a model, choose random assignments to the variables and evaluate the thing (BLAS over finite field, can be very fast). Schwartz-Zippel guarantees you correctness up to e.g. 2^-1000 or whatever confidence you desire, by repeating the procedure. Oh, you want to factorize all entries and want common factors? Black-box factorization does the job.

So what are your experiences with using Schwartz-Zippel-style as a basis for hash-trees (i.e. use results of evaluations at random points over random fields as hashes)?

Now your expressions may involve things that don’t really work well over finite fields, like square roots, exponentials, logarithms, trigonometrics, etc. You can simply model an unknown function with a random function (e.g. evaluate("weird_function", x1, x2, ...) = hash("weird_function", x1, x2, ...)). This has the problem that you lose identities: log(exp(x)) != x or sqrt(x)^2 != x. This is OK (losing identities loses you simplification power; inventing identities makes your results incorrect), but maybe there are good known finite field models for many standard operations? Maybe there are just wise tricks (e.g. “the old masters say: Don’t bother with galois fields, stick with prime fields and make the prime larger/use more points”, or “the old masters say: Use 16-bit primes with precomputed discrete-log tables”)?

9 Likes

Thank you for the info. Much appreciated.

1 Like
  1. hash-trees seem to have to do with bitcoins. Not relevant here. If there are applications to finding subexpression trees that are equivalent,-- if the subtrees are equivalent, just point to the one copy.

  2. hashing in symbolic math has been explored in papers by William A. Martin and by Gaston Gonnet. 50 or 60 years ago.

  3. theoretical results like heuristic zero-testing have (as far as I am aware) no influence at all at system building. It is not that these results are unknown to system builders. Sometimes they are the same people. It is understood that they are not relevant and don’t solve problems that need solving in systems. There are possibly programs written to explore (for example) black-box arithmetic on polynomials (look up Kaltofen) but that’s not particularly useful.
    Doing arithmetic in finite fields for particular algorithms (most notably GCD and factoring) is well understood, and the results are deterministic.

  4. The idea that you would be generating massive arithmetic expressions in some random collection of forms, factored or not, and then need to test “is this zero” is pretty unlikely. If you are doing lots of operations and want speed, you would use a representation which is either canonical or normal (those are technical terms) and in either, any expression that is equivalent to zero would have only one representation, namely 0. So it is not of a concern.

  5. Martin and Gonnet both show methods for hash-coding of exponentials, etc. This cannot always work because you cannot model all the needed relationships in finite fields.

  6. It is not an identity that sqrt(x^2) =x. sqrt(9) is ±3.
    :

1 Like

Speaking as a SymPy maintainer I feel like I can offer lots of useful things to this conversation but I have tried to skim this (extremely long) thread and I’m not sure I’ve followed everything. I don’t know whether it’s useful for me to add my thoughts in a thread that is already so lengthy (maybe a new thread would be better). I have some more constructive comments to make but this doesn’t seem like the right place for them because any comment here will just get buried.

In any case let me explain that SymPy can be much faster while still being implemented in Python. It is slow for some cases because of basic design decisions. The same problems can afflict a CAS implemented in Julia. SymPy is slow in other cases because the intention was always that those would be at some point be implemented in another language in support libraries.

Let’s give an example to be clear what we mean about “slow”. The example here is intended to be illustrative so don’t read too much into the details.

Suppose we have a matrix of integers and we want to do matrix multiplication. In SymPy we can do:

In [1]: M = Matrix(range(1, 100+1)).reshape(10, 10)

In [2]: %time ok = M**2
CPU times: user 75.1 ms, sys: 8.59 ms, total: 83.6 ms
Wall time: 103 ms

There is now an internal more efficient matrix class in sympy that is faster for this example (note I’m using sympy master):

In [3]: from sympy.polys.domainmatrix import DomainMatrix

In [4]: dM = DomainMatrix.from_Matrix(M)

In [5]: %time ok = dM**2
CPU times: user 391 µs, sys: 0 ns, total: 391 µs
Wall time: 398 µs

Some work is involved in making that faster class be used in a way that is transparent to users but it has the potential to significantly improve speed. The flint library (python_flint) can be used to do these same computations even faster e.g.:

In [7]: from flint import fmpz_mat

In [8]: fM = fmpz_mat([[int(e) for e in row] for row in M.tolist()])

In [9]: %time fM**2
CPU times: user 36 µs, sys: 33 µs, total: 69 µs
Wall time: 26 µs

I’m currently working on making that happen implicitly in sympy for performance. The way this works is that we delegate the computationally intensive tasks to a lower-level library and then the Python code orchestrates that lower-level library.

Repeating the same in Julia leads to results that are slower than even the slowest sympy case:

N = 10
Ar = reshape(1:N^2, N, N) * BigInt(1)
@time Ar^2

Running that gives:

$ julia demo.jl 
  0.559016 seconds (1.04 M allocations: 53.008 MiB, 1.48% gc time)

I’m only using this example for illustrative purposes. I’m sure that some of you can show me how to make that faster or explain that I’m not timing it properly. Any feedback on that is welcome because I want to learn more but please also understand that many of the details are not that important in the point that I am making. I intend for SymPy to reach the speed of flint like:

In [1]: N = 1000

In [2]: from flint import fmpz_mat

In [3]: m = fmpz_mat([[i*N + j for j in range(N)] for i in range(N)])

In [4]: %time ok = m**2
CPU times: user 2.27 s, sys: 54.4 ms, total: 2.32 s
Wall time: 2.37 s

How can sympy reach that speed? We can just use flint!

I tried this in Julia:

N = 1000
Ar = reshape(1:N^2, N, N) * BigInt(1)
@time Ar^2

and I got:

$ julia demo.jl 
801.397699 seconds (5.01 G allocations: 89.637 GiB, 19.98% gc time)

The equivalent for sympy’s internal (pure-)Python class is:

In [9]: from sympy.polys.domainmatrix import DomainMatrix

In [10]: dM = DomainMatrix([[ZZ(i*N + j) for j in range(N)] for i in range(N)], (N, N), ZZ)

In [11]: %time ok = dM**2
CPU times: user 3min 34s, sys: 1.92 s, total: 3min 36s
Wall time: 3min 43s

I’m sure that a lot of work can be done in Julia to improve that performance. The ultimate question from a speed perspective is: can Julia be faster than flint? If not then we end up in a situation where both sympy and Symbolics.jl end up using the same underlying library. It is possible (although a huge amount of work) to build a much better CAS than sympy but aiming for feature parity with sympy but with better speed seems like a non-goal to me: it would be much easier to make sympy faster.

12 Likes

This is the classic benchmarking error. The first time any function/operator is called in Julia it is compiled, what you are seeing is compilation time. If you execute the code two times in sequence in a REPL you will see:

julia> N = 10; Ar = reshape(1:N^2, N, N) * BigInt(1); @time Ar^2
  0.384767 seconds (1.04 M allocations: 53.002 MiB, 2.76% gc time)
10×10 Array{BigInt,2}:
  ...

julia> N = 10; Ar = reshape(1:N^2, N, N) * BigInt(1); @time Ar^2
  0.000191 seconds (6.00 k allocations: 112.594 KiB)
10×10 Array{BigInt,2}:
  ...
1 Like

To show that this isn’t just Julia caching the results, it might be good to show that the timings don’t change if N=2 the first time.

3 Likes

Thanks for taking the time to catch up and chime in - informed outsider perspectives are very much appreciated.

A big part of Julia’s value proposition is that if you’re doing some cutting-edge work for which a fast algorithm hasn’t already been written in C, you can write it in Julia and it’ll be nearly as fast (or occasionally faster). I think that’s part of the reason why Flint’s main developers have been working on AbstractAlgebra.jl, Hecke.jl, and Oscar.jl. We can hope to benefit from their new work, but as you say, there’s no need to reinvent the wheel. Flint can be called through Nemo.jl, and it might be good to pull it into the Symbolics ecosystem when & where it’s appropriate. In the meantime,

julia> using Nemo, BenchmarkTools

julia> N = 1000; A = matrix(ZZ, reshape(1:N^2, N, N)*BigInt(1));

julia> N = 1000; A = matrix(ZZ, reshape(1:N^2, N, N)*BigInt(1));

julia> @btime $A^2;
  464.997 ms (1 allocation: 64 bytes)
6 Likes

Is there a way to avoid that compilation time? Otherwise I can see the point that it is relevant for some use cases but not others but I don’t think that it is an “error” to record that compilation time.

Note that SymPy performs the same computation quicker second time round because of cacheing:

In [1]: M = Matrix(range(1, 100+1)).reshape(10, 10)

In [2]: %time ok = M ** 2
CPU times: user 67.2 ms, sys: 5.66 ms, total: 72.8 ms
Wall time: 71.5 ms

In [3]: %time ok = M ** 2
CPU times: user 4.14 ms, sys: 52 µs, total: 4.19 ms
Wall time: 4.23 ms

I didn’t include that in my timings because I don’t consider it relevant. Practical calculations do not repeat precisely the same operation. For the same reason I am always suspicious of benchmark results that come from repeating a calculation in a loop. Many would consider it best practice to use a timing function like btime in julia or timeit in Python that times something by repeating the same operation in a loop. My experience is that those are useful as a guide for micro-optimisation but not for evaluating what is actually faster at the macro level.

2 Likes