Clean Numerical Simulation (Taylor Series Method)

m4r35n357 · August 4, 2018, 2:49pm

Hi all, very new to Julia, and not in academia BTW. I have been investigating the opinions and findings in this paper, and I thought I would contribute some Julia code which performs the aforementioned simulation to potentially ludicrous levels of accuracy, in fewer than 60 lines:

#!/usr/bin/env julia

using Printf

function main()
    setprecision(trunc(Int, parse(Int, ARGS[1]) * log(10.0) / log(2.0)))
    order = parse(Int, ARGS[2])
    h = parse(BigFloat, ARGS[3])
    steps = parse(Int, ARGS[4])
    x = parse(BigFloat, ARGS[5])
    y = parse(BigFloat, ARGS[6])
    z = parse(BigFloat, ARGS[7])
    σ = parse(BigFloat, ARGS[8])
    ρ = parse(BigFloat, ARGS[9])
    β = parse(BigFloat, ARGS[10]) / parse(BigFloat, ARGS[11])
    D0 = parse(BigFloat, "0.0")
    cx = [D0 for _ = 1 : order + 1]
    cy = [D0 for _ = 1 : order + 1]
    cz = [D0 for _ = 1 : order + 1]
    wx = [D0 for _ = 1 : order]
    wy = [D0 for _ = 1 : order]
    wz = [D0 for _ = 1 : order]
    for step = 1 : steps
        @printf "%0.9e %0.9e %0.9e %0.5e\n" x y z (step - 1) * h
        cx[1] = x
        cy[1] = y
        cz[1] = z
        for k = 1 : order
            wx[k] = cx[k]
            wy[k] = cy[k]
            wz[k] = cz[k]
            cx[k+1] = σ * (wy[k] - wx[k]) / k
            wxz = D0
            for j = 1 : k
                wxz += wx[j] * wz[k-j+1]
            end
            cy[k+1] = (ρ * wx[k] - wxz - wy[k]) / k
            wxy = D0
            for j = 1 : k
                wxy += wx[j] * wy[k-j+1]
            end
            cz[k+1] = (wxy - β * wz[k]) / k
        end
        x = cx[order+1]
        y = cy[order+1]
        z = cz[order+1]
        for i = order : -1 : 1
            x = x * h + cx[i]
            y = y * h + cy[i]
            z = z * h + cz[i]
        end
    end
end

main()

Can anyone suggest any speedups? Would this be any use as a benchmark? BTW here is an invocation:

time -p ./tsm-lorenz.jl 160 100 .01 30001 -15.8 -17.48 35.64 10 28 8 3 >/tmp/data-Julia

[code edited to eliminate four useless lines!]

dpsanders · August 6, 2018, 6:10pm

This is very nice, compact code.
In https://github.com/JuliaDiff/TaylorSeries.jl and especially GitHub - PerezHz/TaylorIntegration.jl: ODE integration using Taylor's method, and more, in Julia, this has been generalised to use the Taylor method to solve “arbitrary” ODEs.

m4r35n357 · August 6, 2018, 8:02pm

Thank you, very kind! It is amazing to me how little code this method requires, I suppose because the integrator and the model are so tightly, er, integrated

I have been looking into the packages you mentioned to an extent, but had problems getting everything installed under the 0.7s . . . anyway I was already getting involved in the inner workings of the method so I just kept going with my own stuff. I now have a reasonably useful set of functions at my disposal that I implemented myself.

I mentioned benchmarking, since my code has no dependencies apart from Printf, and does not use accelerated maths hardware. I have observed that this code runs in around 150% of the time for an equivalent c implementation, so if I can use any trickery to close the gap, that would be interesting!

Incidentally, under the same code conditions, Python 3 (using Decimal) is 3 to 4 times slower than Julia.

dpsanders · August 6, 2018, 8:06pm

What do you mean by an equivalent C formulation? Using MPFR directly?

I believe that Julia’s current implementation of BigFloat requires creating new objects in each calculation, which you can avoid to some extent with optimised MPFR calls. This could explain some of the performance gap.

m4r35n357 · August 6, 2018, 8:51pm

Yes. I ported the code to c (which I am not familiar with either) as I also wanted to see if the c code (which uses code generation) linked from this paper was efficient. In the event mine and theirs are indistinguishable speed-wise, so both are probably nearly optimum for MPFR.

And I can’t believe I didn’t combine those last three loops in the Julia code - now 55 lines!

Anyway, if you are curious, here is the c version, which I compiled with:

gcc -Wall -o tsm-lorenz tsm-lorenz.c -lm -lmpfr -s

(I haven’t experimented with optimization yet)

#include<math.h>
#include<stdio.h>
#include<stdlib.h>
#include<mpfr.h>

mpfr_t *jet (int size) {
    mpfr_t *j = (mpfr_t *)malloc(sizeof(mpfr_t) * size);
    for (int i = 0; i < size; i++) {
        mpfr_init(j[i]);
    }
    return j;
}

int main(int argc, char **argv) {
    const int BASE = 10;
    long order, nsteps;
    mpfr_t t, x, y, z, s, r, b, h, t1, t2, t3, *w1, *w2, *w3, *cx, *cy, *cz;
    /* initialize from command arguments */
    mpfr_set_default_prec(strtod(argv[1], NULL) * log(10.0) / log(2.0));
    order = strtol(argv[2], NULL, BASE);
    mpfr_init_set_str(h, argv[3], BASE, GMP_RNDN);
    nsteps = strtol(argv[4], NULL, BASE);
    mpfr_init(t1);
    mpfr_init(t2);
    mpfr_init(t3);
    mpfr_init_set_str(x, argv[5], BASE, GMP_RNDN);
    mpfr_init_set_str(y, argv[6], BASE, GMP_RNDN);
    mpfr_init_set_str(z, argv[7], BASE, GMP_RNDN);
    mpfr_init_set_str(t, "0.0", BASE, GMP_RNDN);
    mpfr_init_set_str(s, argv[8], BASE, GMP_RNDN);
    mpfr_init_set_str(r, argv[9], BASE, GMP_RNDN);
    mpfr_init_set_str(b, argv[10], BASE, GMP_RNDN);
    mpfr_init_set_str(t1, argv[11], BASE, GMP_RNDN);
    mpfr_div(b, b, t1, GMP_RNDN);
    /* initialize the derivative and temporary jets */
    cx = jet(order + 1);
    cy = jet(order + 1);
    cz = jet(order + 1);
    w1 = jet(order);
    w2 = jet(order);
    w3 = jet(order);
    /* main loop */
    for (long step = 1; step < nsteps + 1; step++) {
        /* print a line of output */
        mpfr_out_str(stdout, BASE, 10, x, GMP_RNDN);
        printf(" ");
        mpfr_out_str(stdout, BASE, 10, y, GMP_RNDN);
        printf(" ");
        mpfr_out_str(stdout, BASE, 10, z, GMP_RNDN);
        printf(" ");
        mpfr_out_str(stdout, BASE, 6, t, GMP_RNDN);
        fprintf(stdout, "\n");
        /* compute the taylor coefficients */
        mpfr_set(cx[0], x, GMP_RNDN);
        mpfr_set(cy[0], y, GMP_RNDN);
        mpfr_set(cz[0], z, GMP_RNDN);
        for (int k = 0; k < order; k++) {
            mpfr_set(w1[k], cx[k], GMP_RNDN);
            mpfr_set(w2[k], cy[k], GMP_RNDN);
            mpfr_set(w3[k], cz[k], GMP_RNDN);
            /* x dot */
            mpfr_sub(t3, w2[k], w1[k], GMP_RNDN);
            mpfr_mul(t3, t3, s, GMP_RNDN);
            mpfr_div_si(cx[k + 1], t3, k + 1, GMP_RNDN);
            /* y dot */
            mpfr_mul(t3, r, w1[k], GMP_RNDN);
            mpfr_set_zero(t2, 1);
            for (int j = 0; j < (k + 1); j++) {
                mpfr_mul(t1, w1[k-j], w3[j], GMP_RNDN);
                mpfr_add(t2, t1, t2, GMP_RNDN);
            }
            mpfr_sub(t3, t3, t2, GMP_RNDN);
            mpfr_sub(t3, t3, w2[k], GMP_RNDN);
            mpfr_div_si(cy[k + 1], t3, k + 1, GMP_RNDN);
            /* z dot */
            mpfr_set_zero(t2, 1);
            for (int j = 0; j < (k + 1); j++) {
                mpfr_mul(t1, w1[k-j], w2[j], GMP_RNDN);
                mpfr_add(t2, t1, t2, GMP_RNDN);
            }
            mpfr_mul(t3, b, w3[k], GMP_RNDN);
            mpfr_sub(t3, t2, t3, GMP_RNDN);
            mpfr_div_si(cz[k + 1], t3, k + 1, GMP_RNDN);
        }
        /* sum the series using Horner's method and advance one step */
        mpfr_set(x, cx[order], GMP_RNDN);
        mpfr_set(y, cy[order], GMP_RNDN);
        mpfr_set(z, cz[order], GMP_RNDN);
        for (int j = order - 1; j >= 0; j--) {
            mpfr_mul(x, x, h, GMP_RNDN);
            mpfr_add(x, x, cx[j], GMP_RNDN);
            mpfr_mul(y, y, h, GMP_RNDN);
            mpfr_add(y, y, cy[j], GMP_RNDN);
            mpfr_mul(z, z, h, GMP_RNDN);
            mpfr_add(z, z, cz[j], GMP_RNDN);
        }
        mpfr_mul_si(t, h, step, GMP_RNDN);
    }
    exit(0);
}

dpsanders · August 6, 2018, 10:33pm

And that is why we Julia instead of C…

m4r35n357 · August 7, 2018, 8:51am

Harsh! Well I use the fastest one

The sequence of calls in the c version is the quickest, by experiment. I wonder if Julia uses mpfr_fma() under the covers, that is a very slow call and could account for much of the speed difference.

But after your comment I couldn’t help responding with some c propoganda

$ ls -lAh tsm-lorenz
-rwxrwxr-x 1 ian ian 11K Aug  6 22:30 tsm-lorenz

$ ldd tsm-lorenz
	linux-vdso.so.1 =>  (0x00007ffda833a000)
	libmpfr.so.4 => /usr/lib/x86_64-linux-gnu/libmpfr.so.4 (0x00007f9d5bb46000)
	libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007f9d5b77c000)
	libgmp.so.10 => /usr/lib/x86_64-linux-gnu/libgmp.so.10 (0x00007f9d5b4fc000)
	/lib64/ld-linux-x86-64.so.2 (0x00007f9d5bdab000)

and from top:

  PID  PPID USER     TTY       PR  NI    RES %MEM S  %CPU P   TIME COMMAND                                                                                                                                                                                                     

20295  7497 ian      pts/9     20   0   0.8m  0.0 R  99.7 3   0:05                                  `- ./tsm-lorenz 160 100 .01 30001 -15.8 -17.48 35.64 10 28 8 3                                                                                                             

20297  7497 ian      pts/9     20   0 211.2m  2.7 R 100.0 2   0:04                                  `- ./julia ./tsm-lorenz.jl 160 100 .01 30001 -15.8 -17.48 35.64 10 28 8 3

saschatimme · August 7, 2018, 9:04am

I want to restate that this is probably the reason for the major performance discrepancy. Each MPFR number is allocated on the heap (which takes time) and in your C code you reuse these numbers, i.e., your numbers are mutable, whereas in Julia numbers are by default immutable and thus each operation generates a new MPFR number (which has to be allocated again). A fair comparison would be to also use the imperative MPFR API in your Julia code. So far this has nothing to do with C vs. Julia but rather how to use the MPFR library most efficiently.

m4r35n357 · August 7, 2018, 9:24am

This thread is not about c vs Julia per se, I would really like to user Julia for running this stuff!

So, is it realistic to cache used MPFR instances within Julia and re-use them? That is certainly the strategy recommended by the MPFR docs.

BTW searched the internet for “imperative MPFR API” and nothing came up, what is it?

m4r35n357 · August 12, 2018, 1:41pm

OK found the source. I suspect those calls would make the Julia code look worse than the c I posted

The immutability condition would seem to rule out any improvement I guess.

Is this an issue that devs would care enough about to address?

Topic		Replies	Views
Julia speed Performance question	28	2237	March 14, 2024
Numerical difference between Julia and (C, Fortran, Python) for a chaotic damped driven pendulum Modelling & Simulations numerics , precision	17	2651	June 8, 2021
Julia can be dramatically slower than Matlab when solving 2D PDE New to Julia performance , matlab , pde	42	1217	November 9, 2024
Julia vs C++ speed General Usage performance , c	21	4583	September 2, 2021
Benchmarking a simple PDE algorithm in Julia, Python, Matlab, C++, and Fortran Numerics diffeq	41	8883	November 19, 2019

Clean Numerical Simulation (Taylor Series Method)

Related topics