They I check that exp_fast is well defined in source code, and @fastmath and exp works normally on simple variables. What cause this? Is it caused by @fastmath and @tullio together?
julia> using Tullio
julia> rt=rand(100);
julia> @inbounds @fastmath @tullio et[n] := exp(rt[n])
100-element Vector{Float64}:
[...]
(jl_PLSCG0) pkg> st
Status `/tmp/jl_PLSCG0/Project.toml`
[bc48ee85] Tullio v0.3.5
I couldnโt reproduce your error in any version of Julia v1.5-v1.8. I didnโt bother going further back in time.
Macros operates expressions rewriting. What @fastmath does is to replace all calls to some specific functions (including exp) from f to Base.FastMath.f_fast, so all calls to exp become calls to Base.FastMath.exp_fast. Also, contrary to function calls, nested macros operate left-to-right, so in your expression @fastmath operates before @tullio, which is why Oscar suggested to invert the calls.
Your explanation may sounds reasonable, but this type of problem only exist with exp. If I change to sin, cos or other elementary function, they work well.
That seems like a bug on discourseโฆ can someone disable that limitation or get it fixed? Because we want feedback from users (especially new ones), and to help them.
I didnโt post this to meta, if not just fixed soon, Iโm ok with this moved there, or better place to report such?
I donโt see an error from stacking these macros, but this probably isnโt a good idea.
I believe @inbounds wonโt do anything as functions are defined & it doesnโt cross the boundary; and anyway @tullio should check all ranges before starting, and applied @inbounds itself internally.
@fastmath expands to something which @tullio can digest, but note that (perhaps unwisely) this is already the default setting. It can be disabled with a keyword option.
If LoopVectorization is loaded before the @tullio macro is used, then it will always be used (unless explicitly disabled). For this example it helps a lot, at least on my computer.
julia> @macroexpand1 @fastmath @tullio et[n] := exp(rt[n])
:(#= REPL[97]:1 =# @tullio et[n] := Base.FastMath.exp_fast(rt[n]))
julia> using Tullio
julia> let rt=rand(100);
@btime @tullio et[n] := exp($rt[n])
@btime @tullio et[n] := exp($rt[n]) fastmath=false # default is true
end;
min 602.870 ns, mean 623.025 ns (1 allocation, 896 bytes)
min 583.099 ns, mean 602.295 ns (1 allocation, 896 bytes)
julia> using LoopVectorization
julia> let rt=rand(100);
@btime @tullio et[n] := exp($rt[n]) avx=false # disable LoopVectorization
@btime @tullio et[n] := exp($rt[n]) # avx=true is the default when LV is loaded
end;
min 602.876 ns, mean 685.290 ns (1 allocation, 896 bytes)
min 229.029 ns, mean 306.513 ns (1 allocation, 896 bytes)
Thx. Up to this step, I have the same result with yours.
And as your said, if I only load tullio, everything is correct. However, the error only happenens when tullio and LoopVectorization (even LV is used after tullio) are added together.
Could I solve this when tollin and LoopVectorization are added both? I have tested several functions, only appearing on exp.
Ok, now I can reproduce it. It looks like @turbo is confused by a name with two module qualifiers, Base.FastMath.exp_fast.
julia> using Tullio, LoopVectorization
julia> let rt=rand(100)
a = @inbounds @fastmath @tullio et[n] := exp(rt[n]) avx=false
@show sum(a)
b = @inbounds @fastmath @tullio et[n] := exp(rt[n])
@show sum(b)
end;
sum(a) = 165.47486910574332
ERROR: UndefVarError: `exp_fast` not defined
Stacktrace:
[1] #239#๐๐ธ๐!
@ ~/.julia/packages/Tullio/NGyNM/src/macro.jl:1093 [inlined]
Reproducer without @tullio:
julia> let rt = collect(1:0.1:10.0)
s = 0.0
for i in eachindex(rt)
s += exp(rt[i])
end
s
end
231435.5678161179
julia> let rt = collect(1:0.1:10.0)
s = 0.0
@turbo for i in eachindex(rt)
s += exp(rt[i])
end
s
end
231435.56781611795
julia> let rt = collect(1:0.1:10.0)
s = 0.0
@turbo for i in eachindex(rt)
s += Base.FastMath.exp_fast(rt[i])
end
s
end
ERROR: UndefVarError: `exp_fast` not defined
Stacktrace:
[1] top-level scope
@ ./REPL[44]:3
julia> let rt = collect(1:0.1:10.0), e_f = Base.FastMath.exp_fast
s = 0.0
@turbo for i in eachindex(rt)
s += e_f(rt[i])
end
s
end
231435.56781611795
While this is a bug, I stress that @fastmath @tullio is not a good idea. You can turn the mode on or off with a keyword, but letting the @fastmath macro change things is liable to cause problems (e.g. here it defeats the calculation of gradients).
I suspect that @fastmath @turbo is also uniformly a bad idea, since that macro also replaces many functions it knows about with faster ones.
So there isnโt a perfect method to solve this totally, with coexistence of @turbo and exp_fast? I guess it can be solved after some modifications of LoopVectorization. Now I substitute exp to expm1+1.0 as expedience.