I need to calculate some integrals in my program. Before using Julia i did it in C by using Complex Newton-Cotes formulas(Boole rule, Simpson rule and others):
Here is the simplified example(only with Boole Rule):

#include <stdio.h>
#include <stdlib.h>
#include<math.h>
double f(double x)
{
return pow(x*x+1.0,-1.5);
}
int main()
{
int i;
double a,b,h,step;
puts("Input number of steps and lower and upper band of integral: ");
scanf("%lf %lf %lf",&step,&a,&b);
h=(b-a)/step;
result=0.0;
puts("");
*/
for(i=0;i<krok;i++)
{
result=result+7*f(a+i*h)+32*f(a+i*h+h/4.0)+12*f(a+i*h+h/2.0)+32*f(a+i*h+h*0.75)+7*f(a+(i+1)*h);
}
printf("Result(Boole Rule): %lf",result*h/90.0);
return 0;
}

Do you have linear/surface/volume current distributions? I have very recently done some Biot-Savart integrations for arcs/bars with different quadrature libraries in Julia and I may be able to provide assistance.

Unless you are limited to equally spaced points for some reason, I would normally recommend using methods like Gaussian quadrature or Clenshaw–Curtis quadrature that use unequally spaced points and obtain much faster convergence for smooth integrands. Packages for this in Julia include

Before you think about GPUs or parallelization, make sure to think about the underlying algorithms! Gaussian quadrature etcetera can be exponentially faster than Simpson’s rule, for example.

Also, if you are evaluating Biot–Savart integrals over and over for the same current loop but different points in space, there are potentially many additional ways to accelerate things, e.g. fast multipole algorithms (also here), interpolation schemes, semi-analytical methods to remove the singularity for points near the current source, etcetera.

I deal with various current distributions. That is more technical(but in 3-dimension) like overhead lines, underground cables, utility poles and transmission towers on actual energetic infrastructure objects. So my calculations need to tell apart on Cartesian(X,Y,Z) coordinates. I even write application to calculate this, but that was console application and used only CPU in these calculations, but it works fine however i am sure that this can work much better.

Thanks for help. If it goes about my problem I deal with various, nonsymmetrical problems and I wrote console and fine working program based on this article with using of this Boole Rule(in use in electro-energetic infrastructure) and even make a engineer thesis of whole question(calculation, mathematics and physics issues, application, environment influence of EM Field, engineering use). However I am sure that this can work better.

I see from the paper that you compute the fields from wire configurations which require 1-D integrations. In this case I suggest to use the QuadQK package (https://github.com/JuliaMath/QuadGK.jl) which impements state-of-the art adaptive integration routines, with the benefit that it is very well suited to integrands with singularities (which can be passed to the routine). To increase performance I would not resort to GPUs but rather parallelize the computation within standard Julia by either parallelizing on the conductor pieces (for all points) or on the evaluation points (for all conductors). Hope this helps.

I tested this package and time needed to compute integrals. I don’t know why, but it always(except first run) last about 0.9 s. However when I reduce rtol parameter to very small values it compile longer. My C algorithm last about tens milliseconds. They both give good results(too good so I can’t compare their precision). Maybe you know some integral that is hard too calculate on which I could test these algorithms.
The second question is about output of quadgk function, because I get two values result and error. From my point of view is good for testing, but if I want to use it in production I need only result value, so it’s waste of time, because algorithm have to do some unnecessary operations to get the second value - error.

Ok, I checked some example and now I know that is not that bad as I thinked.
I use vscode and when I compile this: @time integral, err = quadgk(x -> exp(-x^2), 0, 1, rtol=1e-8)
I get:

@time for i=1:100
integral, err = quadgk(x -> exp(-x^2), 0, i, rtol=1e-8)
println("Result: $integral")
end

I get(I’ll shorten my output ):

@time for i=1:100
integral, err = quadgk(x -> exp(-x^2), 0, i, rtol=1e-8)
println("Result: $integral")
end
Result: 0.746824132812427
Result: 0.8820813907624216
...
Result: 0.886226925452758
Result: 0.8862269254527579
0.878343 seconds (1.46 M allocations: 87.892 MiB, 2.96% gc time)

, so it’s:
a) typical Julia compilation start
b) vscode compilation start
That isn’t a problem for me. Sorry for problem. I will test it more. If it goes about error evaluation can I just avoid assignement of error statement to variable(to not product more unnecessary variables).

Don’t benchmark in global scope unless you are using @btime with interpolation of globals. Also note that integrating from 0 to i means that the integration computation (for the same accuracy) gets progressively more expensive as i gets larger.

Whether you assign a value to a variable or not makes no difference in performance (especially in a function, where the compiler can just eliminate the variable if it wants to anyway).

You can always do quadgk(...)[1] to get the first element of the tuple (the integral) without assigning it to a variable.

Note also that quadgk(u -> f(u), ...) is equivalent to quadgk(f, ...)

Finally note that, If you are calculating F(x) = quadgk(f, 0, x) for a whole bunch of different values of x ∈ [0,a], then there are much more efficient things that you can do than separate quadgk integrals for each x, which wastes a lot of calculations that could be shared between different x values, especially if f(x) is a smooth function. For example, using ApproxFun.jl, you can do F = cumsum(Fun(f, 0..a)), and then evaluate F(x) very quickly — this works by first constructing a polynomial approximation of f(x) on [0,a] and then forming the polynomial F(x) that is the indefinite integral.

I have one more problem with calculating inetgral. On CPU I primarily assigned integral=0.0f0 and next used try/catch method to calculate integral, because in some nodes integral can’t be calculated.
The critical part of cuda function looks like that.

When I delete try/catch integral part, the function works.

With this try/catch quadGK part I got error:

KernelError: recursion is currently not supported
Try inspecting the generated code with any of the @device_code_... macros.
Stacktrace:
[1] sort! at sort.jl:521 (repeats 2 times)
[2] fpsort! at sort.jl:1106
[3] eignewt at C:\Users\Wiktor\.julia\packages\QuadGK\v6Zrs\src\gausskronrod.jl:43
[4] kronrod at C:\Users\Wiktor\.julia\packages\QuadGK\v6Zrs\src\gausskronrod.jl:137
[5] do_quadgk at C:\Users\Wiktor\.julia\packages\QuadGK\v6Zrs\src\adapt.jl:7
[6] biotSavartCalculation at c:\Users\Wiktor\CudaTests\testfunc.jl:73