@fastmath does not work in @tullio and exp

As the title, when I want to make a list acted by exp, the system throw this mistake:

start
rt=rand(100);
@inbounds @fastmath @tullio et[n] := exp(rt[n])
UndefVarError: exp_fast not defined

Stacktrace:
  [1] ๐’œ๐’ธ๐“‰!
    @ C:\Users\.julia\packages\Tullio\NGyNM\src\macro.jl:1093 [inlined]
  [2] tile_halves(fun!::var"#๐’œ๐’ธ๐“‰!#53", ::Type{Vector{Float64}}, As::Tuple{Vector{Float64}, Vector{Float64}}, Is::Tuple{UnitRange{Int64}}, Js::Tuple{}, keep::Nothing, final::Bool)
    @ Tullio C:\Users\.julia\packages\Tullio\NGyNM\src\threads.jl:139
  [3] tile_halves(fun!::var"#๐’œ๐’ธ๐“‰!#53", ::Type{Vector{Float64}}, As::Tuple{Vector{Float64}, Vector{Float64}}, Is::Tuple{UnitRange{Int64}}, Js::Tuple{}, keep::Nothing, final::Bool)
    @ Tullio C:\Users\.julia\packages\Tullio\NGyNM\src\threads.jl:142
  [4] tile_halves
    @ C:\Users\.julia\packages\Tullio\NGyNM\src\threads.jl:136 [inlined]
  [5] threader
    @ C:\Users\.julia\packages\Tullio\NGyNM\src\threads.jl:65 [inlined]
  [6] โ„ณ๐’ถ๐“€โ„ฏ
    @ C:\Users\.julia\packages\Tullio\NGyNM\src\macro.jl:807 [inlined]
  [7] (::Tullio.Eval{var"#โ„ณ๐’ถ๐“€โ„ฏ#54"{var"#๐’œ๐’ธ๐“‰!#53"}, Nothing})(args::Vector{Float64})
    @ Tullio C:\Users\.julia\packages\Tullio\NGyNM\src\eval.jl:20
  [8] top-level scope
    @ C:\Users\.julia\packages\Tullio\NGyNM\src\macro.jl:976
  [9] eval
    @ .\boot.jl:368 [inlined]
 [10] include_string(mapexpr::typeof(REPL.softscope), mod::Module, code::String, filename::String)
    @ Base .\loading.jl:1428
end

However, when I remove @fastmath, it works:

@inbounds @tullio et[n] := exp(rt[n])
100-element Vector{Float64}:
 1.7342624711399819
 โ‹ฎ
 2.2736604188652425

They I check that exp_fast is well defined in source code, and @fastmath and exp works normally on simple variables. What cause this? Is it caused by @fastmath and @tullio together?

Does it work if you put the @fastmath inside the @tullio?

Works for me in Julia v1.8.2:

julia> using Tullio

julia> rt=rand(100);

julia> @inbounds @fastmath @tullio et[n] := exp(rt[n])
100-element Vector{Float64}:
[...]

(jl_PLSCG0) pkg> st
Status `/tmp/jl_PLSCG0/Project.toml`
  [bc48ee85] Tullio v0.3.5

I couldnโ€™t reproduce your error in any version of Julia v1.5-v1.8. I didnโ€™t bother going further back in time.

Macros operates expressions rewriting. What @fastmath does is to replace all calls to some specific functions (including exp) from f to Base.FastMath.f_fast, so all calls to exp become calls to Base.FastMath.exp_fast. Also, contrary to function calls, nested macros operate left-to-right, so in your expression @fastmath operates before @tullio, which is why Oscar suggested to invert the calls.

Thanks. But now โ€˜tullioโ€™ doesnโ€™t recognize its special type format

inbounds tullio @fastmath et[n] := exp(rt[n])
LoadError: "can't understand input, expected A[ ] := B[ ] (or with =, or +=, *=, ^=) got #= In[ ]:1 =# fastmath et[n] := exp(rt[n])"
in expression starting at In[ ]:1

(I cannot use at since my account restriction.)

Put the code in tick marks like
```
this
```
Otherwise @abc will send a notification to user abc.

Thx. I am exactly using Julia 1.8.2 and it was installed not long ago.

I did this. But discourse automatically scans how many @ I use and its number cannot be greater than 2.

Have you benchmarked vs using Tullio, LoopVectorization?

Your explanation may sounds reasonable, but this type of problem only exist with exp. If I change to sin, cos or other elementary function, they work well.

@inbounds @fastmath @tullio et[n] := cos(rt[n])
100-element Vector{Float64}:
 0.7741811531923921
 โ‹ฎ
 0.7898610646265318

That seems like a bug on discourseโ€ฆ can someone disable that limitation or get it fixed? Because we want feedback from users (especially new ones), and to help them.

I didnโ€™t post this to meta, if not just fixed soon, Iโ€™m ok with this moved there, or better place to report such?

3 Likes

Here is the benchmark without fastmath for tullio. For LoopVectorization, I donโ€™t know how to code for this problem.

using BenchmarkTools
function testt()
    rt=rand(100);
    @inbounds @tullio et[n] := exp(rt[n])
end
testt (generic function with 1 method)
@benchmark testt()
BenchmarkTools.Trial: 10000 samples with 217 evaluations.
 Range (min โ€ฆ max):  338.249 ns โ€ฆ   6.165 ฮผs  โ”Š GC (min โ€ฆ max): 0.00% โ€ฆ 74.56%
 Time  (median):     374.654 ns               โ”Š GC (median):    0.00%
 Time  (mean ยฑ ฯƒ):   422.982 ns ยฑ 373.409 ns  โ”Š GC (mean ยฑ ฯƒ):  7.40% ยฑ  7.82%

  โ–„โ–‡โ–‡โ–ˆโ–…โ–„โ–‚โ–                             โ–โ–                       โ–‚
  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–‡โ–‡โ–‡โ–‡โ–‡โ–‡โ–ˆโ–‡โ–ˆโ–†โ–…โ–†โ–„โ–…โ–†โ–…โ–…โ–„โ–…โ–„โ–ƒโ–ƒโ–„โ–„โ–†โ–‡โ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–†โ–†โ–„โ–„โ–ƒโ–„โ–ƒโ–…โ–…โ–„โ–โ–โ–ƒโ–ƒโ–ƒโ–โ–โ–…โ–…โ–„ โ–ˆ
  338 ns        Histogram: log(frequency) by time       1.04 ฮผs <

 Memory estimate: 1.75 KiB, allocs estimate: 2.

I donโ€™t see an error from stacking these macros, but this probably isnโ€™t a good idea.

I believe @inbounds wonโ€™t do anything as functions are defined & it doesnโ€™t cross the boundary; and anyway @tullio should check all ranges before starting, and applied @inbounds itself internally.

@fastmath expands to something which @tullio can digest, but note that (perhaps unwisely) this is already the default setting. It can be disabled with a keyword option.

If LoopVectorization is loaded before the @tullio macro is used, then it will always be used (unless explicitly disabled). For this example it helps a lot, at least on my computer.

julia> @macroexpand1 @fastmath @tullio et[n] := exp(rt[n])
:(#= REPL[97]:1 =# @tullio et[n] := Base.FastMath.exp_fast(rt[n]))

julia> using Tullio

julia> let rt=rand(100);
         @btime @tullio et[n] := exp($rt[n])
         @btime @tullio et[n] := exp($rt[n])  fastmath=false  # default is true
       end;
  min 602.870 ns, mean 623.025 ns (1 allocation, 896 bytes)
  min 583.099 ns, mean 602.295 ns (1 allocation, 896 bytes)

julia> using LoopVectorization

julia> let rt=rand(100);
         @btime @tullio et[n] := exp($rt[n])  avx=false # disable LoopVectorization
         @btime @tullio et[n] := exp($rt[n])  # avx=true is the default when LV is loaded
       end;
  min 602.876 ns, mean 685.290 ns (1 allocation, 896 bytes)
  min 229.029 ns, mean 306.513 ns (1 allocation, 896 bytes)

@mcabbott

Thx. Up to this step, I have the same result with yours.
And as your said, if I only load tullio, everything is correct. However, the error only happenens when tullio and LoopVectorization (even LV is used after tullio) are added together.
Could I solve this when tollin and LoopVectorization are added both? I have tested several functions, only appearing on exp.

Ok, now I can reproduce it. It looks like @turbo is confused by a name with two module qualifiers, Base.FastMath.exp_fast.

julia> using Tullio, LoopVectorization

julia> let rt=rand(100)
         a = @inbounds @fastmath @tullio et[n] := exp(rt[n])  avx=false
         @show sum(a)
         b = @inbounds @fastmath @tullio et[n] := exp(rt[n])
         @show sum(b)
       end;
sum(a) = 165.47486910574332
ERROR: UndefVarError: `exp_fast` not defined
Stacktrace:
 [1] #239#๐’œ๐’ธ๐“‰!
   @ ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1093 [inlined]

Reproducer without @tullio:

julia> let rt = collect(1:0.1:10.0)
         s = 0.0
         for i in eachindex(rt)
           s += exp(rt[i])
         end
         s
       end
231435.5678161179

julia> let rt = collect(1:0.1:10.0)
         s = 0.0
         @turbo for i in eachindex(rt)
           s += exp(rt[i])
         end
         s
       end
231435.56781611795

julia> let rt = collect(1:0.1:10.0)
         s = 0.0
         @turbo for i in eachindex(rt)
           s += Base.FastMath.exp_fast(rt[i])
         end
         s
       end
ERROR: UndefVarError: `exp_fast` not defined
Stacktrace:
 [1] top-level scope
   @ ./REPL[44]:3

julia> let rt = collect(1:0.1:10.0), e_f = Base.FastMath.exp_fast
         s = 0.0
         @turbo for i in eachindex(rt)
           s += e_f(rt[i])
         end
         s
       end
231435.56781611795

While this is a bug, I stress that @fastmath @tullio is not a good idea. You can turn the mode on or off with a keyword, but letting the @fastmath macro change things is liable to cause problems (e.g. here it defeats the calculation of gradients).

I suspect that @fastmath @turbo is also uniformly a bad idea, since that macro also replaces many functions it knows about with faster ones.

2 Likes

So there isnโ€™t a perfect method to solve this totally, with coexistence of @turbo and exp_fast? I guess it can be solved after some modifications of LoopVectorization. Now I substitute exp to expm1+1.0 as expedience.