Speeding up trig-heavy code

Hello!

I recently released v0.3.2 of SolarPosition.jl. I feel like it is now in a position where I am very happy with the performance, at least I think I covered all the basics like type stability covered in Performance Tips · The Julia Language.

I benchmarked SolarPosition.jl against its python equivalent solposx, and found that we are usually between 10x-200x faster (singlethreaded) depending on input size and algorithm choice. PSA, which provides a great tradeoff between accuracy and computation time, is 200x faster than the python version. On top of that we can do multithreading, while the python version can’t (at least not trivially).

Here is a profiling view I created for PSA:

As you can see most time is lost computing trigonometry functions like sin, cos, atan.

This is mostly for my own curiosity, but is there even more to gain here? I experimented with @fastmath, but found that it breaks correctness in exchange for maybe 5-10% percent performance gain. I also experimented with static arrays, but found that it yielded no improvements.

If you want to try something, feel free to make a PR. The github bot will benchmark your changes against the main branch using AirspeedVelocity.jl so you immediately get feedback.

2 Likes

Is it common to have just a single solar position you want to compute? At this point, it is possible that the single-threaded speedups are close to exhausted (although I wouldn’t underestimate the cleverness of people here) but setting up parallelized computations on GPUs would probably yield massive instant speedups

2 Likes

You could try trigonometric functions in packages like SLEEF.jl

5 Likes

If your code is calculating trigonometric values of same angle then combinations like sincosand sincosd etc. can help improve performance.
I think we should have better documentation of KernelAbstractions.jl. I am also looking for fast solution to my code but using GPU is tough as its documentation are not detailed.

2 Likes

I use it for the MTK component wrapper, but other than that probably not.

True, already using that.

1 Like

But now i think that i am wrong :joy:.

julia> using BenchmarkTools

julia> x= rand(1000);

julia> f(x) = (sin(x), cos(x))
f (generic function with 1 method)

julia> m(x) = sincos(x)
m (generic function with 1 method)

julia> @btime f.($x);

  5.805 μs (3 allocations: 15.70 KiB)

julia> @btime m.($x);

  6.161 μs (3 allocations: 15.70 KiB)

Results varies with size of x.