I’ve written a large numeric simulation that benefits substantially from starting Julia with --math-mode=fast (79% speed-up). I know this flag can be dangerous, but up until now, everything has worked well.
I’d now like to add Bayesian Optimization to tune a few parameters. Unfortunately, the Gaussian Process MLE breaks when I use explicit parameter bounds with fast-math enabled. Performance is not important for the Gaussian Process – I just need it to work.
Is there a way to disable fast-math for only the call to the Bayesian Optimization function?
I’ve tried switching to explicit @fastmath in the obvious parts of my code, but I still get a 43% speed-up by using the global --math-mode=fast flag.
I don’t know of a way to do this easily. Looking at codegen (intrinsics.cpp), it seems that codegen just uses the global flags which are set when julia starts. And the @fastmath macro is a strictly local transformation.
Perhaps the best you can do here is to start two julia instances; one to do the outer loop with Bayesian optimization (fastmath off) and another (or several) worker processes to run the simulation with fastmath on? This might be a good setup in any case, as with multiple cores/machines you’ll then be able to run several simulations in parallel to feed back into the Bayesian optimization.
Explicit @fastmath seems like by far the best option to me, if you could achieve about the same performance. I’m curious why you can’t… is it because --math-mode=fast applies to code within Base and other packages that you can’t reach with @fastmath?
I’ll need to profile more carefully to see where the differences are. I haven’t tried @fastmath with every function yet, so it’s possible I’ve missed some, or like you said, there’s some code in another package that is affected.
I will be using remote workers, so I may just do that: run a master without fast-math and have all the slaves use fast-math.
The problem with @fastmath is that it doesn’t compose so it could be a pretty big burden to add it enough places, especially if using external libraries.
Almost all optimizations that fastmath allows can be done manually by some combination of:
explicit algebraic simplification of expressions,
@simd annotations to allow floating-point re-association across loop iterations,
use of muladd to compute a*b + c operations in a single operation.
It’s definitely some work but if you can identify which functions are sped up by --math-mode=fast then you can speed them up manually. It’s probably just a handful of functions that are making most of the difference. My guess is that there’s a few loops that need @simd annotations in order to vectorize.