`@fastmath` is not applied to macros

Today I discovered that @fastmath is not applied to inlined function calls (maybe this should be in the docs?), I understand that there is a good reason behind it.

However I would have expected this to work on expressions returned by macros, but this doesn’t seem to be the case:

macro m_fma(a, b, c)
    return esc(quote $a * $b + $c end)
end

f_fma(a, b, c) = a * b + c

@fastmath fma_macro(a, b, c) = @m_fma(a, b, c)
@fastmath fma_func(a, b, c) = f_fma(a, b, c)
@fastmath fma_direct(a, b, c) = a * b + c

Then when inspecting the code of each function:

julia> @code_llvm debuginfo=:none fma_macro(1.1, 2.2, 3.3)
define double @julia_fma_macro_246(double %0, double %1, double %2) #0 {
top:
  %3 = fmul double %0, %1
  %4 = fadd double %3, %2
  ret double %4
}

julia> @code_llvm debuginfo=:none fma_func(1.1, 2.2, 3.3)
define double @julia_fma_func_248(double %0, double %1, double %2) #0 {
top:
  %3 = fmul double %0, %1
  %4 = fadd double %3, %2
  ret double %4
}

julia> @code_llvm debuginfo=:none fma_direct(1.1, 2.2, 3.3)
define double @julia_fma_direct_250(double %0, double %1, double %2) #0 {
top:
  %3 = fmul fast double %1, %0
  %4 = fadd fast double %3, %2
  ret double %4
}

@fastmath was applied only on fma_direct, but not on fma_macro. I believe that it is because @fastmath doesn’t expand macros:

julia> (@macroexpand @m_fma(1.1, 2.2, 3.3)) == (@macroexpand @fastmath @m_fma(1.1, 2.2, 3.3))
true

This gives rise to an irregular behavior, where a function written explicitly can be more performant that its macro counterpart.

I imagine this behavior is somewhat intended, but is it intuitive?
Since macros inject code into its callee, shouldn’t this code be applied to @fastmath too?

The main problem I have with @fastmath currently, is that this behavior is is not explicitly stated in the docs. Not propagating @fastmath is a quite important thing to consider when coding for performance.
I originally encountered this issue when tracking down a difference of performance with a near-identical C++ code, and it took me quite a while to find the source.

1 Like

I wouldn’t necessarily expect that, no - it would require @fastmath recursively expanding every macrocall it encounters.

@fastmath is precisely intended for small, local control; not for large blocks of code. This is usually seen as a strength, because it allows opting into --ffast-math like behavior locally, without having to turn on a global flag (that may remove protections in other parts of the code base, expecting to run without “fast” math).

This behavior is a consequence of macros being applied outside-in, not inside-out:

# only expand one macro
julia> @macroexpand1 @fastmath @m_fma 1.1 1.2 1.3
:(#= REPL[3]:1 =# @m_fma 1.1 1.2 1.3)

So making @fastmath (or other macros) support that requires either changing the order macros are expanded in (a breaking change) or making @fastmath expand macrocalls it encounters recursively.

I understand that @fastmath requiring the expansion of the expression it is given is problematic, however this makes @fastmath useless when using functions/macros which are not @fastmath themselves (e.g. from another package).

I believe this is this somewhat in contradiction with Julia’s composability: performance critical code would then require to use only functions annotated with @fastmath, which package authors might not want to do for every function/macro they write, and this is most likely not wanted for every user of their package.

I would agree with the fact that @fastmath is better suited for small functions. Yet I feel like there should be a way to write big, math intensive functions (e.g. CFD kernels) without requiring @fastmath on every function.

You are free to use --ffast-math as a flag to Julia as well, with all downsides & consequences that implies :person_shrugging: The general issue is that you as a library user (without looking at the implementation) can’t really know whether it’s ok to use @fastmath inside of a library function or not.

For example, in the example you linked you can annotate the entire function with @fastmath and get the behavior you’re looking for:

julia> const a = 369.3299304675746
369.3299304675746

julia> const b = 6.755399441055744e15
6.755399441055744e15

julia> @fastmath function g(x::Float64)
           z = muladd(x, a, b)
           z -= b
       end
g (generic function with 1 method)

julia> @code_llvm g(1.0)
;  @ REPL[9]:1 within `g`
define double @julia_g_219(double %"x::Float64") #0 {
top:
;  @ REPL[9]:2 within `g`
; ┌ @ float.jl:414 within `muladd`
   %0 = fmul contract double %"x::Float64", 0x40771547652B82FE
; └
;  @ REPL[9]:3 within `g`
  ret double %0
}

It’s not sufficient to just annotate muladd because in order to simplify that, the subtraction needs to be considered for --ffast-math as well. I believe that was also discussed further down in the other thread - so it’s not quite true that inlined calls are not optimized with @fastmath, it’s just that not all optimizations can be applied if only part of the resulting function is marked as allowed to ignore IEEE semantics. That’s a result of having local control, and julia is composing here just fine. Of course, if the function call in question is not inlined, it’d really be an error to make it use @fastmath logic internally - if the function is so expensive as not to be able to inline, no amount of @fastmath or --ffast-math is going to make that function call use SIMD (which is often the intention of allowing @fastmath semantics) or be vectorized with the surrounding code.

To be clear, it’s not that @fastmath doesn’t work on larger blocks or function calls in general - it’s just that the larger the block encompassed is, the more unpredictable the results become, just as with --ffast-math as a global flag. Though for me that’s all the more reason to be extremely wary of @fastmath use in the wild (especially on large codeblocks) - often it’s not guarded appropriately, checking for Inf/NaN preemptively etc.

Actually, those consequences were considered too severe:

You do not want it to apply to functions like sinpi’s implementation.

1 Like

Right, I forgot about that PR!