CAS Best Practices

It might be worthwhile to remember that this is not about millisecond versus microsecond: Some years ago I was deriving some analytical expressions resulting from products of symbolic matrices in Matlab (mupad). The process was started in the morning, and finally by the morning the next day it choked and stopped with an error. I guess that would be the sort of problem for which one would like to see a memory-conservative and fast symbolic code.

6 Likes

I want to just say that it seems like the original intention of this thread was to solicit concrete advice about designing and implementing a CAS from people with experience. It has turned into a somewhat tense back and forth about whether that is worth doing and whether the Julia language is better or worse language for implementing a CAS than those used by existing projects. Putting my moderator hat on, that isn’t a great direction for the conversation: some people are going to be on one side, some on the other; debating seems unlikely to resolve the disagreement and likely to get heated. I don’t want to close the thread, but please consider taking it back to the original intention of concrete advice on how to design and implement a CAS, not trying to convince a group of people who want to try something that they are foolish to do so. It may or may not be worthwhile, but the only way to find out is for them to try it and see how it goes. Other people are entitled to think that’s a waste of time and energy, but it’s their time and energy to choose how to spend.

32 Likes

Just looking over this note again, I see you refer to Lample’s paper on neural networks and symbolic integration, a paper which is reviewed here (trying to give them the benefit of the doubt, but critical)

but I would be more skeptical. How much can you do when the only constants you are allowed are the integers between -5 and 5, you sometimes give wrong answers, and (to my mind, worse) make up some benchmark based on your synthetic test set and claim superior results.
It may be possible to find a legitimate use of machine learning (say, to choose parameters in heuristic search of some sort in numerical or even symbolic computation), but I view this particular paper as a misuse of ML and a bogus attempt to justify it. I’d look elsewhere for a boost for ML + symbolic algebra, though it strikes me as implausible from fundamental concerns. Like ML can’t really do arithmetic, it seems.

.

3 Likes

Some food for thought, :sunglasses: Example Code using Nemo, and using Hecke Computer Algebra for your consideration

####
# Example Code using Nemo, and using Hecke Computer Algebra and Number Theory Packages for Julia - 1705.06134.pdf @@ https://arxiv.org/pdf/1705.06134.pdf
#
# NOTE Don't whack yourself lol, Buy some insurance by First Creating a New Experimental TEST Environment
# (v1.4-EXP3) pkg> activate v1.4-EXP3
# add Nemo     # Nemo  v0.18.4
# add Hecke    # Hecke v0.8.6
# Find and Edit Folder: .julia/packages/Nemo/QMPxi/test/  File: Benchmark-test.jl
using Nemo
using Hecke

using Test
# Test Summary:       | Pass  Total 4
@testset "Benchmark.fateman..." begin
   R, x = PolynomialRing(ZZ, "x")
   S, y = PolynomialRing(R, "y")
   T, z = PolynomialRing(S, "z")
   U, t = PolynomialRing(T, "t")

   p = (x + y + z + t + 1)^20

   q = p*(p + 1)

   @test length(q) == 41
end

@testset "Benchmark.pearce..." begin
   R, x = PolynomialRing(ZZ, "x")
   S, y = PolynomialRing(R, "y")
   T, z = PolynomialRing(S, "z")
   U, t = PolynomialRing(T, "t")
   V, u = PolynomialRing(U, "u")

   f = (x + y + 2z^2 + 3t^3 + 5u^5 + 1)^10
   g = (u + t + 2z^2 + 3y^3 + 5x^5 + 1)^10

   q = f*g

   @test length(q) == 61
end

@testset "Benchmark.resultant..." begin
   R, x = FlintFiniteField(7, 11, "x")
   S, y = PolynomialRing(R, "y")
   T = ResidueRing(S, y^3 + 3x*y + 1)
   U, z = PolynomialRing(T, "z")

   f = (3y^2 + y + x)*z^2 + ((x + 2)*y^2 + x + 1)*z + 4x*y + 3
   g = (7y^2 - y + 2x + 7)*z^2 + (3y^2 + 4x + 1)*z + (2x + 1)*y + 1

   s = f^12
   t = (s + g)^12

   r = resultant(s, t)

   @test r == (x^10+4*x^8+6*x^7+3*x^6+4*x^5+x^4+6*x^3+5*x^2+x)*y^2+(5*x^10+x^8+4*x^7+3*x^5+5*x^4+3*x^3+x^2+x+6)*y+(2*x^10+6*x^9+5*x^8+5*x^7+x^6+6*x^5+5*x^4+4*x^3+x+3)
end

@testset "Benchmark.poly_nf_elem..." begin
   R, x = CyclotomicField(20, "x")
   S, y = PolynomialRing(R, "y")

   f = (3x^7 + x^4 - 3x + 1)*y^3 + (2x^6-x^5+4x^4-x^3+x^2-1)*y +(-3x^7+2x^6-x^5+3x^3-2x^2+x)

   @test f^300 == f^100*f^200
end
# Test Summary:       | Pass  Total 4
# Test Summary:       | Pass  Total 4

# === Original source of my motivation / inspiraton
# arbitrary-precision interval arithmetic using the midpoint-radius representation, also known as ball arithmetic.
# https://discourse.julialang.org/t/calculation-pi-pi-pi-in-hexadecimal-in-high-precision/51785/60?u=marc.cox
# dpsanders ** # Committer
# @Marc.Cox I believe it’s using Fredrik Johansson’s arb software. He has a nice paper describing how that works.

## File: CITATION.bib
## Citing Hecke
# If your research depends on computations done with Hecke, please consider giving us a formal citation:
# - Claus Fieker, William Hart, Tommy Hofmann and Fredrik Johansson, [Nemo/Hecke: Computer Algebra and Number Theory Packages
#   for the Julia Programming Language](http://doi.acm.org/10.1145/3087604.3087611). In: Proceedings of ISSAC '17, pages 157–164, New York, NY, USA, 2017. ACM.
#
## NOTE Documentation Yay !
# Documentation for Nemo.jl can be found here: https://nemocas.github.io/Nemo.jl/latest/index.html
# Documentation for Hecke.jl can be found here: https://www.thofma.com/Hecke.jl/latest/
2 Likes

Hi Petr @PetrKryslUCSD

Ugh :cold_face:, I really sympathize w you about " … by the morning the next day it choked :upside_down_face: and stopped with an :nauseated_face: error."

And you have always been uber helpful :innocent:, so I wanted to throw out there for your consideration “… arbitrary-precision interval arithmetic using the midpoint-radius representation, also known as ball arithmetic.” using Nemo and/or IntervalArithmetic.jl described better by @dpsanders here @@ Calculation Pi^Pi^Pi in hexadecimal in High precision - #60 by dpsanders

As for my take on arbitrary-precision interval arithmetic (so far)
is that it is deterministic (no “next day bombs” - UGH!),
AND rigorous (?almost?) to the extent of a mathematical proof of correctness. :innocent:

So I’ve been using Nemo instead of SpecialFunctions for a while when using ApproxFun,
but my particular use of Nemo code might be only obliquely related to your needs,
so I whipped up some quick code using Nemo and using Hecke for you below
that just might be better suited to your purposes (of iteration), memory-conservative,
and fast symbolic code ?

NOTE WARNING and TIP >> To perform the SAME task using floating point arithmetic would require an a priori error analysis. … In contrast Arb’s ball arithmetic allows the error analysis to be carried along with the computation. … << NOTE WARNING and TIP

NOTE TIP >> using Nemo allows for fast development, while at the same time maintaining Guaranteed results. << NOTE TIP

####
####

# Example Code using Nemo using Hecke Computer Algebra / CAS and Number Theory Packages for Julia - 1705.06134.pdf @@ https://arxiv.org/pdf/1705.06134.pdf
#
# NOTE Don't whack yourself lol, Buy some insurance by First Creating a New Experimental TEST Environment
# (v1.4-EXP3) pkg> activate v1.4-EXP3
# add Nemo     # Nemo  v0.18.4
# add Hecke    # Hecke v0.8.6
using Nemo
using Hecke

## Marc adding ...
# Nemo/Hecke: >> Computer Algebra (CAS) and Number Theory Packages for the Julia Programming Language <<
# Source: https://arxiv.org/pdf/1705.06134.pdf
#
# 5.2  Arb: arbitrary precision ball arithmetic Computing over R and C requires using some approximate model for these fields. The most common model is floating-point arithmetic.
# NOTE >> However, for many computer algebra / CAS algorithms,
# the error analysis necessary to guarantee correct results with floating-point arithmetic becomes impractical. Interval arithmetic solves this problem by effectively making error analysis automatic << NOTE
#
# Many problems can be solved using lazy evaluation: the user can try a computation with some tentative precision P and restart with precision 2*P if that fails. The precision can be set optimally when a good estimate for the minimal required P available; that is, the intervals can be used as if they were plain floating-point numbers,and NOTE >> the Automatic error bounds simply provide a Certificate. << NOTE
#
# 6.2  The use of interval arithmetic @@ https://arxiv.org/pdf/1705.06134.pdf
# Finally we check whether diam(B(i)α) ≤2−p.
# NOTE If not, we Increase the Working Precision P and repeat.
# NOTE >> When finished, the midpoints of the ballsB(i)α are approximationsˆσi(α)
# with the desired property. << NOTE
# The following Hecke code illustrates this.
#
QQx, x = PolynomialRing(QQ, "x") # NIX FlintQQ
# function NumberField(f::fmpq_poly, s::AbstractString; cached::Bool = true, check::Bool = true)
flintfun_f = x^3 + 3*x + 1
K, gen_po_a = Nemo.NumberField(flintfun_f, "pgen_a",cached=true,check=true)
alpha = gen_po_a^2 + gen_po_a + 1
# WARNING Original paper seems to have a Code BUG ish ? IOW appeared initially IMPOSSIBLE w global int_target_precision_p = 128
#  per julia> log(1/typemax(Int64))/log(2) == -63.0
#      julia> 2^64 == 0
# BUG w Original -int_target_precision_p = 128
global int_target_precision_p = 16
global int_initial_precision_p_prime = 4
println("")
println("Increase the Working Precision P and repeat UNTIL all([ Hecke.radiuslttwopower(u, -int_target_precision_p) for u in z])")
while true
 CCy, y = PolynomialRing(AcbField(int_initial_precision_p_prime), "y")
 g = CCy(x^3 + 3*x + 1)
 rts = roots(g) # Attempts to isolate the complex roots of the complex polynomial xx by iteratively refining balls in which they lie. .. Q: ? Related to https://en.wikipedia.org/wiki/Fundamental_theorem_of_algebra , https://en.wikipedia.org/wiki/Weierstrass_factorization_theorem ?
 z = map(x^2 + x + 1, rts)
 # 3-element Array{acb,1}: # .. # [-1.8908090 +/- 3.32e-8] + i*[-2.3196168 +/- 2.12e-8]
 #
 # https://fredrikj.net/arb/mag.html#c.mag_cmp_2exp_si
   #  int mag_cmp_2exp_si(const mag_t x, slong y)
   #   Returns negative, zero, or positive, depending on whether x is smaller, equal, or larger than 2y
 if all( [ Hecke.radiuslttwopower(u, -1*int_target_precision_p) for u in z ] )
   # Possible BUG w Original code where -int_target_precision_p = 128 likely due to 2^128 Julia overflow/underflow from 64Bit numbers per julia> 2^128 == 0
   # TODO Try again with ALL numbers as arbitrary high precision Arbs ?
   # TODO IOW ALL Arbs, and NO 32Bit or 64Bit numbers
   break
 else
  global int_initial_precision_p_prime = 2*int_initial_precision_p_prime
  println("Increase/DOUBLE the Working Precision P int_initial_precision_p_prime:=",int_initial_precision_p_prime," and repeat UNTIL all([ Hecke.radiuslttwopower(u, -int_target_precision_p) for u in z])")
 end
end

# NOTE WARNING and TIP >> To perform the SAME task using floating point arithmetic would require an a priori error analysis. .. In contrast Arb’s ball arithmetic allows the error analysis to be carried along with the computation. .. << NOTE WARNING and TIP
# NOTE TIP >> using Nemo allows for fast development, while at the same time maintaining Guaranteed results. << NOTE TIP 

println("##125 !! DONE EOJ Increased the Working Precision P_prime and repeated UNTIL all([ Hecke.radiuslttwopower(u, -int_target_precision_p) for u in z]) is >=", 2.0^(-1*int_target_precision_p))

# === Original source of my motivation / inspiraton
# arbitrary-precision interval arithmetic using the midpoint-radius representation, also known as ball arithmetic.
# https://discourse.julialang.org/t/calculation-pi-pi-pi-in-hexadecimal-in-high-precision/51785/60?u=marc.cox
# dpsanders ** # Committer
# @Marc.Cox I believe it’s using Fredrik Johansson’s arb software. He has a nice paper describing how that works.

## File: CITATION.bib
## Citing Hecke
# If your research depends on computations done with Hecke, please consider giving us a formal citation:
# - Claus Fieker, William Hart, Tommy Hofmann and Fredrik Johansson, [Nemo/Hecke: Computer Algebra and Number Theory Packages
#   for the Julia Programming Language](http://doi.acm.org/10.1145/3087604.3087611). In: Proceedings of ISSAC '17, pages 157–164, New York, NY, USA, 2017. ACM.
#
## NOTE Documentation Yay !
# Documentation for Nemo.jl can be found here: https://nemocas.github.io/Nemo.jl/latest/index.html
# Documentation for Hecke.jl can be found here: https://www.thofma.com/Hecke.jl/latest/
1 Like

Because there is a pressing in parts of the julia ecosystem to do some light-weight symbolic computations. Many of these computations never see end-users / explicit human interaction. Maybe the end-user just said, hey numerically solve this ODE initial value problem for me, but sure, try to use symbolic jacobians if it helps to generate better (i.e. faster for a given desired precision) machine code.

The correct comparison is not sagemath in “sage mode”, the correct comparison is to writing python code that starts with from sage.all import * and is run with sage -python, i.e. uses python semantics but accesses sage functionality as a library (and unfortunately needs to be executed with the lightly modified CPython version shipped by sage).

Indeed, sage as a language is imho pretty useless – but sage as a python library is super cool.

And now we’re talking seriously:

First, julia beats python performance for code written in the respective languages. (of course e.g. fflas-ffpack bundled with sage will in turn handily beat generic julia linear-algebra code, until someone writes and imports a FFLAS-FFPACK wrapper into their julia project)

Secondly and more importantly, julia as a language is pretty well-adapted to doing this in a more modular fashion: Ubiquitous dynamic multi dispatch is super powerful, and makes the endeavor more tractable from a pure software-engineering and project management and funding perspective (in sagemath, different modules need to be tightly coupled and developed under a single umbrella organization, because they need to know about each other. Julia is more amenable to having separate modules developed independently, and combined in a mix-and-match fashion).

And now we’re talking interesting things! So, representing expression trees as hash-trees is a pretty nifty way of saving some memory, recognizing repeated subexpressions, and occasionally getting exponential speedups. What is your experience with this? What is your “get off my lawn, youngsters”-rant about pitfalls. Which invariants should one build into the hash?

Another very powerful approach is blackbox factorization / Schwartz-Zippel / polynomial-identity-testing. For example, say that you have a symbolic jacobian that involves some exact integer/rational entries and some rational expressions in variables; now you’re interested in it’s inverse. Which entries are zero?

Each entry of the inverse is clearly a multivariate rational function. However, if you tried to expand things into monomials, that would be exponentially sized. No biggy, choose some random finite field as a model, choose random assignments to the variables and evaluate the thing (BLAS over finite field, can be very fast). Schwartz-Zippel guarantees you correctness up to e.g. 2^-1000 or whatever confidence you desire, by repeating the procedure. Oh, you want to factorize all entries and want common factors? Black-box factorization does the job.

So what are your experiences with using Schwartz-Zippel-style as a basis for hash-trees (i.e. use results of evaluations at random points over random fields as hashes)?

Now your expressions may involve things that don’t really work well over finite fields, like square roots, exponentials, logarithms, trigonometrics, etc. You can simply model an unknown function with a random function (e.g. evaluate("weird_function", x1, x2, ...) = hash("weird_function", x1, x2, ...)). This has the problem that you lose identities: log(exp(x)) != x or sqrt(x)^2 != x. This is OK (losing identities loses you simplification power; inventing identities makes your results incorrect), but maybe there are good known finite field models for many standard operations? Maybe there are just wise tricks (e.g. “the old masters say: Don’t bother with galois fields, stick with prime fields and make the prime larger/use more points”, or “the old masters say: Use 16-bit primes with precomputed discrete-log tables”)?

9 Likes

Thank you for the info. Much appreciated.

1 Like
  1. hash-trees seem to have to do with bitcoins. Not relevant here. If there are applications to finding subexpression trees that are equivalent,-- if the subtrees are equivalent, just point to the one copy.

  2. hashing in symbolic math has been explored in papers by William A. Martin and by Gaston Gonnet. 50 or 60 years ago.

  3. theoretical results like heuristic zero-testing have (as far as I am aware) no influence at all at system building. It is not that these results are unknown to system builders. Sometimes they are the same people. It is understood that they are not relevant and don’t solve problems that need solving in systems. There are possibly programs written to explore (for example) black-box arithmetic on polynomials (look up Kaltofen) but that’s not particularly useful.
    Doing arithmetic in finite fields for particular algorithms (most notably GCD and factoring) is well understood, and the results are deterministic.

  4. The idea that you would be generating massive arithmetic expressions in some random collection of forms, factored or not, and then need to test “is this zero” is pretty unlikely. If you are doing lots of operations and want speed, you would use a representation which is either canonical or normal (those are technical terms) and in either, any expression that is equivalent to zero would have only one representation, namely 0. So it is not of a concern.

  5. Martin and Gonnet both show methods for hash-coding of exponentials, etc. This cannot always work because you cannot model all the needed relationships in finite fields.

  6. It is not an identity that sqrt(x^2) =x. sqrt(9) is ±3.
    :

1 Like

Speaking as a SymPy maintainer I feel like I can offer lots of useful things to this conversation but I have tried to skim this (extremely long) thread and I’m not sure I’ve followed everything. I don’t know whether it’s useful for me to add my thoughts in a thread that is already so lengthy (maybe a new thread would be better). I have some more constructive comments to make but this doesn’t seem like the right place for them because any comment here will just get buried.

In any case let me explain that SymPy can be much faster while still being implemented in Python. It is slow for some cases because of basic design decisions. The same problems can afflict a CAS implemented in Julia. SymPy is slow in other cases because the intention was always that those would be at some point be implemented in another language in support libraries.

Let’s give an example to be clear what we mean about “slow”. The example here is intended to be illustrative so don’t read too much into the details.

Suppose we have a matrix of integers and we want to do matrix multiplication. In SymPy we can do:

In [1]: M = Matrix(range(1, 100+1)).reshape(10, 10)

In [2]: %time ok = M**2
CPU times: user 75.1 ms, sys: 8.59 ms, total: 83.6 ms
Wall time: 103 ms

There is now an internal more efficient matrix class in sympy that is faster for this example (note I’m using sympy master):

In [3]: from sympy.polys.domainmatrix import DomainMatrix

In [4]: dM = DomainMatrix.from_Matrix(M)

In [5]: %time ok = dM**2
CPU times: user 391 µs, sys: 0 ns, total: 391 µs
Wall time: 398 µs

Some work is involved in making that faster class be used in a way that is transparent to users but it has the potential to significantly improve speed. The flint library (python_flint) can be used to do these same computations even faster e.g.:

In [7]: from flint import fmpz_mat

In [8]: fM = fmpz_mat([[int(e) for e in row] for row in M.tolist()])

In [9]: %time fM**2
CPU times: user 36 µs, sys: 33 µs, total: 69 µs
Wall time: 26 µs

I’m currently working on making that happen implicitly in sympy for performance. The way this works is that we delegate the computationally intensive tasks to a lower-level library and then the Python code orchestrates that lower-level library.

Repeating the same in Julia leads to results that are slower than even the slowest sympy case:

N = 10
Ar = reshape(1:N^2, N, N) * BigInt(1)
@time Ar^2

Running that gives:

$ julia demo.jl 
  0.559016 seconds (1.04 M allocations: 53.008 MiB, 1.48% gc time)

I’m only using this example for illustrative purposes. I’m sure that some of you can show me how to make that faster or explain that I’m not timing it properly. Any feedback on that is welcome because I want to learn more but please also understand that many of the details are not that important in the point that I am making. I intend for SymPy to reach the speed of flint like:

In [1]: N = 1000

In [2]: from flint import fmpz_mat

In [3]: m = fmpz_mat([[i*N + j for j in range(N)] for i in range(N)])

In [4]: %time ok = m**2
CPU times: user 2.27 s, sys: 54.4 ms, total: 2.32 s
Wall time: 2.37 s

How can sympy reach that speed? We can just use flint!

I tried this in Julia:

N = 1000
Ar = reshape(1:N^2, N, N) * BigInt(1)
@time Ar^2

and I got:

$ julia demo.jl 
801.397699 seconds (5.01 G allocations: 89.637 GiB, 19.98% gc time)

The equivalent for sympy’s internal (pure-)Python class is:

In [9]: from sympy.polys.domainmatrix import DomainMatrix

In [10]: dM = DomainMatrix([[ZZ(i*N + j) for j in range(N)] for i in range(N)], (N, N), ZZ)

In [11]: %time ok = dM**2
CPU times: user 3min 34s, sys: 1.92 s, total: 3min 36s
Wall time: 3min 43s

I’m sure that a lot of work can be done in Julia to improve that performance. The ultimate question from a speed perspective is: can Julia be faster than flint? If not then we end up in a situation where both sympy and Symbolics.jl end up using the same underlying library. It is possible (although a huge amount of work) to build a much better CAS than sympy but aiming for feature parity with sympy but with better speed seems like a non-goal to me: it would be much easier to make sympy faster.

12 Likes

This is the classic benchmarking error. The first time any function/operator is called in Julia it is compiled, what you are seeing is compilation time. If you execute the code two times in sequence in a REPL you will see:

julia> N = 10; Ar = reshape(1:N^2, N, N) * BigInt(1); @time Ar^2
  0.384767 seconds (1.04 M allocations: 53.002 MiB, 2.76% gc time)
10×10 Array{BigInt,2}:
  ...

julia> N = 10; Ar = reshape(1:N^2, N, N) * BigInt(1); @time Ar^2
  0.000191 seconds (6.00 k allocations: 112.594 KiB)
10×10 Array{BigInt,2}:
  ...
1 Like

To show that this isn’t just Julia caching the results, it might be good to show that the timings don’t change if N=2 the first time.

3 Likes

Thanks for taking the time to catch up and chime in - informed outsider perspectives are very much appreciated.

A big part of Julia’s value proposition is that if you’re doing some cutting-edge work for which a fast algorithm hasn’t already been written in C, you can write it in Julia and it’ll be nearly as fast (or occasionally faster). I think that’s part of the reason why Flint’s main developers have been working on AbstractAlgebra.jl, Hecke.jl, and Oscar.jl. We can hope to benefit from their new work, but as you say, there’s no need to reinvent the wheel. Flint can be called through Nemo.jl, and it might be good to pull it into the Symbolics ecosystem when & where it’s appropriate. In the meantime,

julia> using Nemo, BenchmarkTools

julia> N = 1000; A = matrix(ZZ, reshape(1:N^2, N, N)*BigInt(1));

julia> N = 1000; A = matrix(ZZ, reshape(1:N^2, N, N)*BigInt(1));

julia> @btime $A^2;
  464.997 ms (1 allocation: 64 bytes)
6 Likes

Is there a way to avoid that compilation time? Otherwise I can see the point that it is relevant for some use cases but not others but I don’t think that it is an “error” to record that compilation time.

Note that SymPy performs the same computation quicker second time round because of cacheing:

In [1]: M = Matrix(range(1, 100+1)).reshape(10, 10)

In [2]: %time ok = M ** 2
CPU times: user 67.2 ms, sys: 5.66 ms, total: 72.8 ms
Wall time: 71.5 ms

In [3]: %time ok = M ** 2
CPU times: user 4.14 ms, sys: 52 µs, total: 4.19 ms
Wall time: 4.23 ms

I didn’t include that in my timings because I don’t consider it relevant. Practical calculations do not repeat precisely the same operation. For the same reason I am always suspicious of benchmark results that come from repeating a calculation in a loop. Many would consider it best practice to use a timing function like btime in julia or timeit in Python that times something by repeating the same operation in a loop. My experience is that those are useful as a guide for micro-optimisation but not for evaluating what is actually faster at the macro level.

2 Likes

Yes and no. You can pass --compile=min to the compiler, in this case it interprets Julia:

[henrique AreaDeTrabalho]$ julia --compile=min
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.5.3 (2020-11-09)
 _/ |\__'_|_|_|\__'_|  |  Official https://julialang.org/ release
|__/                   |

julia> N = 10; Ar = reshape(1:N^2, N, N) * BigInt(1); @time Ar^2
  0.095967 seconds (765.04 k allocations: 39.622 MiB, 6.56% gc time)
10×10 Array{BigInt,2}:
  ...

julia> N = 10; Ar = reshape(1:N^2, N, N) * BigInt(1); @time Ar^2
  0.005052 seconds (11.26 k allocations: 310.578 KiB)
10×10 Array{BigInt,2}:
  ...

But in general, for heavy computation you want it to be compiled. The difference is that the compilation happens each time a function is called with the same tuple of parameter types. So, the same function being called multiple times with different values (but of the same type, like Int), will pay for the compilation a single time. For code that makes a lot of effort the compilation time becomes negligible, so the only downside is if you gonna call a lot of functions a single time, and they call different function internally, e.g., scripts, and for that --compile=min is often useful.

1 Like

The key here is that this isn’t Julia caching calculation. It’s caching compilation. This means you’ll see the same thing if you do M^2 on any other Matrix with the same datatype. Also, the compilation time is O(1), so it looks worst in micro-benchmarks.

2 Likes

I got better times, but it can just be my machine is more powerful, my setup is:

julia> versioninfo()
Julia Version 1.5.3
Commit 788b2c77c1 (2020-11-09 13:37 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-7700HQ CPU @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-9.0.1 (ORCJIT, skylake)

and I got:

julia> function f(N); Ar = reshape(1:N^2, N, N) * BigInt(1); return Ar^2; end
f (generic function with 1 method)

julia> @time f(1000)
384.090324 seconds (5.01 G allocations: 89.638 GiB, 24.26% gc time)
1000×1000 Array{BigInt,2}
  ...

julia> @time f(1000)
350.711817 seconds (5.01 G allocations: 89.638 GiB, 27.66% gc time)
1000×1000 Array{BigInt,2}:
  ...

EDIT: not better than Flint, I mean better than the Julia times reported above.

Thanks all. Okay well I learned something from this exchange and now I can show comparable timings:

In [1]: In [1]: N = 1000
   ...: 
   ...: In [2]: from flint import fmpz_mat
   ...: 
   ...: In [3]: m = fmpz_mat([[i*N + j for j in range(N)] for i in range(N)])

In [2]: %timeit ok = m**2
1.41 s ± 52.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

That’s compared to

**julia>** using Nemo, BenchmarkTools

Welcome to Nemo version 0.20.1

Nemo comes with absolutely no warranty whatsoever

**julia>** N = 1000; A = matrix(ZZ, reshape(1:N^2, N, N)*BigInt(1));

**julia>** @btime B = A^2;

1.393 s (1268 allocations: 94.52 MiB)

What that shows is that similar performance is possible in both cases. (Although installing and compiling Nemo took a long time and I don’t entirely accept either time result since running the same operation once gives a longer time in both cases)

My real point though was that it’s possible to get the same speed improvements from within Python. I didn’t want to argue about speed but rather to point out some detail of how sympy can be slow but that it is possible to make it faster.

There are more relevant discussions to be had about speed in a CAS. Personally I think that automatic evaluation is the major mistake made in sympy (and many other CAS). That is what currently slows down ordinary operations in sympy. Automatic evaluation is extremely hard to change when backwards compatibility comes into play because any change is not compatible.

6 Likes

Currently in Symbolics.jl (0.10.0) any numeric type (the default symbol) automatically transforms x - x to 0. If I understand right, @oscarbenjamin and @carette are both saying it should stay as x - x until explicitly evaluated with simplify()/evaluate()/etc?

1 Like

I was warning that x-x is tricky. It’s actually fairly safe to transform it to 0*x. For one, that’s type-safe, in the sense that x is still around, so you know the result’s type properly. (Polymorphic 0 is a major pain.)

If your numeric types include infinities, then this is required. If your numeric types only contain finite values, and will never contain anything other than classical numbers (vectors, matrices and series all have non-trivial claims to be kinds of numbers too) like the reals, complex, etc, that might be safe. But it depends very strongly on your meaning of ‘number’.

5 Likes

It’s zero(x) or zero(T) in Julia

6 Likes