Julia integral calculation - community module or own module?

I need to calculate some integrals in my program. Before using Julia i did it in C by using Complex Newton-Cotes formulas(Boole rule, Simpson rule and others):
Here is the simplified example(only with Boole Rule):

#include <stdio.h>
#include <stdlib.h>
#include<math.h>
double f(double x)
{
    return pow(x*x+1.0,-1.5);
}

int main()
{
    int i;

    double a,b,h,step;
    puts("Input number of steps and lower and upper band of integral: ");
    scanf("%lf %lf %lf",&step,&a,&b);
    h=(b-a)/step;
   
    result=0.0;
    puts("");
    */
    for(i=0;i<krok;i++)
    {
        result=result+7*f(a+i*h)+32*f(a+i*h+h/4.0)+12*f(a+i*h+h/2.0)+32*f(a+i*h+h*0.75)+7*f(a+(i+1)*h);
    }
    printf("Result(Boole Rule): %lf",result*h/90.0);

    return 0;
}

And I can rewrite this in more elegant version for Julia.
However I want to use GPU to speed up my programs.
Should I use my algorithm or use this for example: GitHub - JuliaApproximation/FastGaussQuadrature.jl: Julia package for Gaussian quadrature
Which I found in this topic:
Numerical integration over given integral. How to do it in Julia? - #4 by CarloLucibello
The majority of my calculations are connected with Biot-Savart Law.

1 Like

Do you have linear/surface/volume current distributions? I have very recently done some Biot-Savart integrations for arcs/bars with different quadrature libraries in Julia and I may be able to provide assistance.

Unless you are limited to equally spaced points for some reason, I would normally recommend using methods like Gaussian quadrature or Clenshaw–Curtis quadrature that use unequally spaced points and obtain much faster convergence for smooth integrands. Packages for this in Julia include

Before you think about GPUs or parallelization, make sure to think about the underlying algorithms! Gaussian quadrature etcetera can be exponentially faster than Simpson’s rule, for example.

Also, if you are evaluating Biot–Savart integrals over and over for the same current loop but different points in space, there are potentially many additional ways to accelerate things, e.g. fast multipole algorithms (also here), interpolation schemes, semi-analytical methods to remove the singularity for points near the current source, etcetera.

10 Likes

I deal with various current distributions. That is more technical(but in 3-dimension) like overhead lines, underground cables, utility poles and transmission towers on actual energetic infrastructure objects. So my calculations need to tell apart on Cartesian(X,Y,Z) coordinates. I even write application to calculate this, but that was console application and used only CPU in these calculations, but it works fine however i am sure that this can work much better.

Thanks for help. If it goes about my problem I deal with various, nonsymmetrical problems and I wrote console and fine working program based on this article with using of this Boole Rule(in use in electro-energetic infrastructure) and even make a engineer thesis of whole question(calculation, mathematics and physics issues, application, environment influence of EM Field, engineering use). However I am sure that this can work better.

I see from the paper that you compute the fields from wire configurations which require 1-D integrations. In this case I suggest to use the QuadQK package (https://github.com/JuliaMath/QuadGK.jl) which impements state-of-the art adaptive integration routines, with the benefit that it is very well suited to integrands with singularities (which can be passed to the routine). To increase performance I would not resort to GPUs but rather parallelize the computation within standard Julia by either parallelizing on the conductor pieces (for all points) or on the evaluation points (for all conductors). Hope this helps.

1 Like

I thinked to separate blocks for evaluation points and threads of the blocks for all segments of conductors.

I tested this package and time needed to compute integrals. I don’t know why, but it always(except first run) last about 0.9 s. However when I reduce rtol parameter to very small values it compile longer. My C algorithm last about tens milliseconds. They both give good results(too good so I can’t compare their precision). Maybe you know some integral that is hard too calculate on which I could test these algorithms.
The second question is about output of quadgk function, because I get two values result and error. From my point of view is good for testing, but if I want to use it in production I need only result value, so it’s waste of time, because algorithm have to do some unnecessary operations to get the second value - error.

It is hard to say anything about this without a self-contained example.

It is a very cheap operation (using the previous refinement of the algorithm, if I remember correctly).

That said, it is prudent to use it in production too, eg error if the error is above some threshold that your calculations would tolerate.

The error estimate is produced automatically as a byproduct of the algorithm, with no additional function evaluations.

5 Likes

Ok, I checked some example and now I know that is not that bad as I thinked.
I use vscode and when I compile this:
@time integral, err = quadgk(x -> exp(-x^2), 0, 1, rtol=1e-8)
I get:

@time integral, err = quadgk(x -> exp(-x^2), 0, 1, rtol=1e-8)
  0.839247 seconds (1.42 M allocations: 86.103 MiB, 3.32% gc time)
(0.746824132812427, 7.887024366937112e-13)

but when I compile this:

@time for i=1:100
  integral, err = quadgk(x -> exp(-x^2), 0, i, rtol=1e-8)
  println("Result: $integral")
end

I get(I’ll shorten my output :slightly_smiling_face:):

@time for i=1:100
         integral, err = quadgk(x -> exp(-x^2), 0, i, rtol=1e-8)
         println("Result: $integral")
       end
Result: 0.746824132812427
Result: 0.8820813907624216

...
Result: 0.886226925452758
Result: 0.8862269254527579
  0.878343 seconds (1.46 M allocations: 87.892 MiB, 2.96% gc time)

, so it’s:
a) typical Julia compilation start
b) vscode compilation start
That isn’t a problem for me. Sorry for problem. I will test it more. If it goes about error evaluation can I just avoid assignement of error statement to variable(to not product more unnecessary variables).

If you want accurate estimates without the compilation time, you can use BenchmarkTools.

julia> using QuadGK, BenchmarkTools

julia> @time integral, err = quadgk(x -> exp(-x^2), 0, 1, rtol=1e-8)
  1.194962 seconds (3.32 M allocations: 181.200 MiB, 9.70% gc time)
(0.746824132812427, 7.887024366937112e-13)

julia> @btime quadgk(x -> exp(-x^2), 0, 1, rtol=1e-8)
  748.218 ns (19 allocations: 592 bytes)
(0.746824132812427, 7.887024366937112e-13)

julia> @benchmark quadgk(x -> exp(-x^2), 0, 1, rtol=1e-8)
BenchmarkTools.Trial: 
  memory estimate:  592 bytes
  allocs estimate:  19
  --------------
  minimum time:     726.023 ns (0.00% GC)
  median time:      963.829 ns (0.00% GC)
  mean time:        1.014 Îźs (11.09% GC)
  maximum time:     393.248 Îźs (99.73% GC)
  --------------
  samples:          10000
  evals/sample:     129

@btime shows the minimum time, while @benchmark shows a summary. As you can see, @benchmark ran the function 1.29 million times.

3 Likes

Don’t benchmark in global scope unless you are using @btime with interpolation of globals. Also note that integrating from 0 to i means that the integration computation (for the same accuracy) gets progressively more expensive as i gets larger.

Whether you assign a value to a variable or not makes no difference in performance (especially in a function, where the compiler can just eliminate the variable if it wants to anyway).

1 Like

I have little problem. In Cuda i want to calculate some vector in type of :

x = (blockIdx().x-1) * blockDim().x + threadIdx().x
y = (blockIdx().y-1) * blockDim().y + threadIdx().y
z = (blockIdx().z-1) * blockDim().z + threadIdx().z
offset=x+y*blockDim().x *gridDim.x+z*blockDim().x *gridDim.x*blockDim().y *gridDim.y

Table[offset]=someValues*quadgk(u->f(u), 0.0f0, upperBound[offset])

The problem is that quadgk return a tuple and if I’d want to do it in separate operations like

integral[offset],error=quadgk(u->f(u), 0.0f0, upperBound[offset])
Table[offset]=someValues*integral[offset]

I would need to synchronize threads(or maybe I’m missing something) which isn’t the best idea. Is there an easier way to do it?

You can always do quadgk(...)[1] to get the first element of the tuple (the integral) without assigning it to a variable.

Note also that quadgk(u -> f(u), ...) is equivalent to quadgk(f, ...)

Finally note that, If you are calculating F(x) = quadgk(f, 0, x) for a whole bunch of different values of x ∈ [0,a], then there are much more efficient things that you can do than separate quadgk integrals for each x, which wastes a lot of calculations that could be shared between different x values, especially if f(x) is a smooth function. For example, using ApproxFun.jl, you can do F = cumsum(Fun(f, 0..a)), and then evaluate F(x) very quickly — this works by first constructing a polynomial approximation of f(x) on [0,a] and then forming the polynomial F(x) that is the indefinite integral.

5 Likes

I have one more problem with calculating inetgral. On CPU I primarily assigned integral=0.0f0 and next used try/catch method to calculate integral, because in some nodes integral can’t be calculated.
The critical part of cuda function looks like that.

... (
        Sx=Segment[XelementIndex+1]-Segment[XelementIndex]; Sy=Segment[YelementIndex+1]-Segment[YelementIndex];
        L=sqrt(Sx*Sx+Sy*Sy);
        mi=(Segment[ZelementIndex+1]-Segment[ZelementIndex])/Sy;
        δz=z[blockIdx().z]+mi*Segment[YelementIndex]-Segment[ZelementIndex]; 
        ι=Sx/L; β=Sy/L;
        a=1+mi*mi; p=(ι*δx+β*δy+mi*δz)/a; q=δx*δx+δy*δy+δz*δz-p*p*a;
        integral=0.0f0;
        try
            integral = quadgk(u -> ((u-p)^2+q)^(-3/2), 0, L, rtol=1f-8)[1]
         catch
         end;
       
        @atomic Bx[offset]+=integral*(β*δz-mi*δy); 
        @atomic By[offset]+=integral*(ι*δz-mi*δx); 
        @atomic Bz[offset]+=integral*(ι*δy-β*δx))

When I delete try/catch integral part, the function works.

With this try/catch quadGK part I got error:

KernelError: recursion is currently not supported

Try inspecting the generated code with any of the @device_code_... macros.

Stacktrace:
 [1] sort! at sort.jl:521 (repeats 2 times)
 [2] fpsort! at sort.jl:1106
 [3] eignewt at C:\Users\Wiktor\.julia\packages\QuadGK\v6Zrs\src\gausskronrod.jl:43
 [4] kronrod at C:\Users\Wiktor\.julia\packages\QuadGK\v6Zrs\src\gausskronrod.jl:137
 [5] do_quadgk at C:\Users\Wiktor\.julia\packages\QuadGK\v6Zrs\src\adapt.jl:7
 [6] biotSavartCalculation at c:\Users\Wiktor\CudaTests\testfunc.jl:73

Is the way to handle with this?

You should notice that your integrand can be evaluated exactly without any quadgk:

\int_0^L du \frac{1}{\left((u-p)^2+q\right)^{3/2}} = \frac{L-p}{q \sqrt{L^2-2 L p+p^2+q}}+\frac{p}{q \sqrt{p^2+q}}
2 Likes

I know it, it’s really old topic, but thanks for intentions. I’ll close it if it’s possible.