Help speeding up a test progam from a book

foobar_lv · October 12, 2017, 4:34pm

What you really want to do is emit the juicy RSQRTSS and RCPSS instructions (on x86) and their vectorized counterparts. In terms of speedup, single precision is chump change compared to these beautiful bastards (compute approximate reciprocal square-root and approx reciprocal of floats for almost the same price as multipliation, as of http://www.agner.org/optimize/instruction_tables.pdf).

Which brings me to the question: Is there a sensible way of accessing these instructions in Julia? Like @really_fast_math? Or as @inline RSQRTSS(x::Float) = something, which hopefully compiles to the single fast instruction on architectures where it exists, and otherwise to 1/sqrt(x)? (we would need to figure out whether llvm understands this operation and possibly do some version of asm - but julia/llvm need to understand this well enough to optimize out boilerplate/ select a free register/ etc, so a native x86 emit does not cut the cake)

Re symplectic integrators: I kinda remember that the long-term energy non-conservation due to time-discretization is exponentially small in the time-step, because all sorts of things cancel. Floating point errors are different, and hence might blow up in our face. If you say that this is not an issue, then I will believe you; I do not remember the detailed proofs for approximate energy conservation, but I do remember writing into the margins of my textbook that its proof was not directly applicable to floating point errors. This does not preclude better theorems being available in the literature; as I said, symplectic integrators are not my field.
(approximate evaluations of an exact function are not the same as exact evaluations of an approximate function, especially if this function and the approximate function have extra structure like smoothness, analyticity, equivariance under some group, symplecticity, exactness (as a differential form), closedness (exact and no de-rham)).

This question becomes very relevant if we perform the square root with the ~15 bits of precision given by RSQRTSS.

Topic		Replies	Views
Trying to understand low performance compared to C++ Performance	13	330	October 2, 2024
Benchmark game challenge and some optimization questions Performance	29	2787	January 13, 2024
Why Julia can be faster than C? New to Julia	3	731	October 11, 2022
Julia speed Performance question	28	2211	March 14, 2024
Help to get my slow Julia code to run as fast as Rust/Java/Lisp Performance	100	4586	August 6, 2021

Help speeding up a test progam from a book

Related topics