@fastmath macro accuracy

schnetter · May 12, 2020, 7:52pm

Ah, I see that there are three more flags offered by LLVM (all included by fast): contract, afn, and reassoc.

schnetter · May 12, 2020, 7:57pm

Yes, that would work.

I would use a @generated function so that the fast token can be generated programmatically. With 7 flags there are 128 versions of these functions to be generated.

Is llvmcall fast enough, or will it slow down compiling things too much?

-erik

kristoffer.carlsson · May 12, 2020, 8:00pm

Yeah, that’ what happens in your package now

github.com

eschnett/SIMD.jl/blob/8e9c91cd66f87fba330672b0e4dd6b81cd01865b/src/LLVM_intrinsics.jl#L58-L86


      
          module FastMath
              const nnan     = 1 << 0
              const ninf     = 1 << 1
              const nsz      = 1 << 2
              const arcp     = 1 << 3
              const contract = 1 << 4
              const afn      = 1 << 5
              const reassoc  = 1 << 6
              const fast     = 1 << 7
          end
          
          struct FastMathFlags{T} end
          Base.@pure FastMathFlags(T::Int) = FastMathFlags{T}()
          
          function fp_str(::Type{FastMathFlags{T}}) where {T}
              flags = String[]
              (T & FastMath.nnan     != 0) && push!(flags, "nnan")
              (T & FastMath.ninf     != 0) && push!(flags, "ninf")
              (T & FastMath.nsz      != 0) && push!(flags, "nsz")
              (T & FastMath.arcp     != 0) && push!(flags, "arcp")

This file has been truncated. show original

github.com

eschnett/SIMD.jl/blob/8e9c91cd66f87fba330672b0e4dd6b81cd01865b/src/LLVM_intrinsics.jl#L180


      
              :and
              :or
              :xor
          ]
          
          for f in BINARY_OPS_FLOAT
              @eval @generated function $f(x::T, y::T, ::F=nothing) where {T<:LT{<:FloatingTypes}, F<:FPFlags}
                  fpflags = fp_str(F)
                  ff = $(QuoteNode(f))
                  s = """
                  %3 = $ff $fpflags $(llvm_type(T)) %0, %1
                  ret $(llvm_type(T)) %3
                  """
                  return :(
                      $(Expr(:meta, :inline));
                      Base.llvmcall($s, T, Tuple{T, T}, x, y)
                  )
              end
          end
          
          for f in BINARY_OPS_INT

I think it is pretty ok. It recently got made faster speed up llvmcall unique name generation by JeffBezanson · Pull Request #35144 · JuliaLang/julia · GitHub.

schnetter · May 12, 2020, 8:09pm

Oh. Does it. How embarrassing. Thank you.

schnetter · May 12, 2020, 8:12pm

Someone should take this and split it out into a FastMath.jl, and advertise it…

Elrod · May 12, 2020, 8:19pm

Okay, I’ll file a report.
I get it with LLVM 9 (+ Julia 1.5), but not with LLVM 8 (+ Julia 1.4).

julia> @benchmark vdiv!($x1, $y, $a)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     20.792 ns (0.00% GC)
  median time:      20.940 ns (0.00% GC)
  mean time:        22.101 ns (0.00% GC)
  maximum time:     50.008 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

julia> @benchmark vdiv_fast!($x2, $y, $a)
BenchmarkTools.Trial:
  memory estimate:  0 bytes
  allocs estimate:  0
  --------------
  minimum time:     21.394 ns (0.00% GC)
  median time:      21.447 ns (0.00% GC)
  mean time:        22.808 ns (0.00% GC)
  maximum time:     49.948 ns (0.00% GC)
  --------------
  samples:          10000
  evals/sample:     997

I think this shows that if you thought about the order of operations – whether because your algorithm depends on order for accuracy or for performance – it’s less likely that things go wrong without @fastmath.

EDIT:
Just tested LLVM 10 (+ Julia 1.5), and I still see the issue.

kristoffer.carlsson · May 12, 2020, 9:06pm

Also SIMD.jl now works with @fastmath e.g.

julia> f1(a, b, c) = a * b - c * 2.0
f1 (generic function with 1 method)

julia> f2(a, b, c) = @fastmath a * b - c * 2.0
f2 (generic function with 1 method)

julia> V = Vec{4, Float64}
Vec{4,Float64}

julia> code_native(f1, Tuple{V, V, V}, debuginfo=:none)
        .section        __TEXT,__text,regular,pure_instructions
        vmovupd (%rsi), %ymm0
        vmulpd  (%rdx), %ymm0, %ymm0
        movq    %rdi, %rax
        vmovupd (%rcx), %ymm1
        vaddpd  %ymm1, %ymm1, %ymm1
        vsubpd  %ymm1, %ymm0, %ymm0
        vmovapd %ymm0, (%rdi)
        vzeroupper
        retq
        nop

julia> code_native(f2, Tuple{V, V, V}, debuginfo=:none)
        .section        __TEXT,__text,regular,pure_instructions
        movq    %rdi, %rax
        vmovupd (%rdx), %ymm0
        vmovupd (%rcx), %ymm1
        vaddpd  %ymm1, %ymm1, %ymm1
        # vvvv fused in fast version vvvvvv
        vfmsub231pd     (%rsi), %ymm0, %ymm1 ## ymm1 = (ymm0 * mem) - ymm1 
        vmovapd %ymm1, (%rdi)
        vzeroupper
        retq
        nopl    (%rax)

Topic		Replies	Views
@fastmath switched off in Julia 0.7-Dev, deliberately? Performance fast-math	7	1743	June 24, 2018
Is @fastmath less accurate on M-series macs? General Usage numerics	7	119	January 17, 2025
`@fastmath` is not applied to macros Performance fast-math	5	408	July 20, 2023
What's going on with exp() and --math-mode=fast? General Usage fast-math	29	4413	October 23, 2021
Using @fastmath Performance	1	106	November 3, 2024

@fastmath macro accuracy

Related topics