Trig functions very slow

simonbyrne · September 24, 2018, 10:32pm

There are a couple of things we should do:

Write faster scalar functions. We use translated versions of openlibm (which were based on FreeBSD’s libm, which in turn descend from fdlibm written by Sun in the 90s). These give respectable performance, but there is some room for improvement (e.g. by exploiting fma instructions on newer architectures).
Provide hooks to use vectorised kernels, such as SVML, SLEEF, Yeppp or Apple’s Accelerate library, for operations like broadcast and reduce. LLVM provides such hooks for its own intrinsics: we don’t currently use these because they are hard coded to call the system math library, not our own scalar functions. Apparently this is fixable, but the best option would be to have a general framework to do this for arbitrary functions, not just the those blessed by LLVM.
A framework to write our own vectorised kernels. SIMD.jl provides some low-level functionality, but ideally we would have a higher-level way to write SPMD code like ISPC, which would do things like rewrite branches into bitmasks.

Unfortunately the discussion of this issue has become somewhat fragmented, but the main issues are:

Topic		Replies	Views
Why is this simple function twice as slow as its Python version Performance question	97	4319	April 12, 2021
Sincos slightly slower than sin and cos Performance trigonometry	2	745	December 16, 2021
Weird @btime results with `sin` Performance question , benchmark , trigonometry	1	390	May 22, 2021
Why are calls to this function slow, compared to Python+Numpy? General Usage fast-math	14	2110	August 13, 2017
Why is this code so slow in julia compared to a numpy implementation? Performance performance	9	3561	October 24, 2017