I have the following function:
function wind_factor(height)
log1 = log(height / 0.0002) / log(6.0 / 0.0002)
exp1 = exp(0.08163 * log(height/6.0))
return log1 + (log1 - exp1)
end
The parameter height is in the range 6β¦1000 .
It is a bit slow for my purpose (about 23ns on my PC). Is there a way to increase the speed?
I do not need the full accuracy, and error of 1e-4 or 1e-5 would be good enough. Furthermore, when I call this function again and again the height is changing only by a small amount. Could this fact be used to speed up the calculation?
1 Like
yeah, this is way slower than it needs to be. The first obvious improvement is replacing division with multiplication.
function wind_factor(height)
log1 = log(height * 5000) / log(6.0 * 5000)
exp1 = exp(0.08163 * log(height*0.16666666666666666))
return log1 + (log1 - exp1)
end
Ok, down to 20 ns. Nice!
Next step is to do your math in Float32
since that will make exp
and log
faster (and throw `@fastmath at the function with will also help).
Or just throw it at LoopVectorization:
julia> using LoopVectorization
julia> @btime map(wind_factor, v) setup=(v = rand(1000));
21.600 ΞΌs (1 allocation: 7.94 KiB)
julia> @btime vmap(wind_factor, v) setup=(v = rand(1000));
5.933 ΞΌs (1 allocation: 7.94 KiB)
julia> @btime vmapt(wind_factor, v) setup=(v = rand(1000));
1.450 ΞΌs (1 allocation: 7.94 KiB)
1 Like
Well, it does not make it faster:
@benchmark wind_factor(height) setup=(height=Float32(6.0+rand()*500.0))
This is 21 ns.
1 Like
You need to make the insides Float32
also. log(6.0 * 5000)
is a Float64
etc which promotes everything else.
1 Like
You have float64 consts in the function, those need to be float32 too
1 Like
I cannot use LoopVectorization because I do not have a vector of values that I want to calculate. This function is called 10 time in the callback function of a solver.
1 Like
function wind_factor(height)
log1 = log(height * Float32(5000)) / log(Float32(6.0 * 5000))
exp1 = exp(Float32(0.08163) * log(height*Float32(0.16666666666666666)))
return log1 + (log1 - exp1)
end
This doesnβt help. I tested it before, log and exp are not faster for Float32 than for Float64. Not sure why, thoughβ¦
next thing to change is that log(x*5000)=log(x)+log(5000)
. This lets you only calculate log(x)
once.
Thanks! Down to 17.5 ns now:
function wind_factor(height)
log_height = log(height)
log1 = (log_height + log(5000)) / log(6.0 * 5000)
exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
return log1 + (log1 - exp1)
end
Only one log and one exp function call left that are not constant.
Open question: Are there approximations for log and exp that are faster than the standard functions if you are happy with a reduced accuracyβ¦
Found this discussion: c# - Fast Exp calculation: possible to improve accuracy without losing too much performance? - Stack Overflow
ChangePrecision.jl helps for quick experiments with precision:
julia> @changeprecision Float32 function wind_factor(height)
log_height = log(height)
log1 = (log_height + log(5000)) / log(6.0 * 5000)
exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
return log1 + (log1 - exp1)
end
wind_factor (generic function with 1 method)
julia> @btime wind_factor(v) setup = (v = rand(Float32))
11.912 ns (0 allocations: 0 bytes)
0.6653509f0
1 Like
Well, for me this code is slower than if I use Float64:
@changeprecision Float32 function wind_factor(height)
log_height = log(height)
log1 = (log_height + log(5000)) / log(6.0 * 5000)
exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
return log1 + (log1 - exp1)
end
@benchmark wind_factor(height) setup=(height=Float32((6.0+rand()*994.0)))
Please take into account that the height parameter is between 6 and 1000, that has an influence on the performanceβ¦
Over such a limited range of inputs (where your output only goes from 1 to 1.474), and where it sounds like you only need a few digits of accuracy, I would just do a minimax polynomial or rational function fit and then use evalpoly
.
1 Like
From this approximation for the expoential function,
@inline function fastexp(x)
y = 1 + x / 1024
y *= y; y *= y; y *= y; y *= y; y *= y
y *= y; y *= y; y *= y; y *= y; y *= y
return y
end
function wf(height)
log5_000 = 8.517193191416238
invlog30_000 = 0.09700306451281168
log6inv = -1.791759469228055
log_height = log(height)
log1 = (log_height + log5_000) * invlog30_000
exp1 = fastexp(0.08163 * (log_height + log6inv))
return log1 + (log1 - exp1)
end
Testing the accuracy:
for x in [0.1,1.0,2.0,100.0,1000.0]
println(wf(x) - wind_factor(x))
end
3.9054456273168725e-5
9.025009359731051e-6
3.590340520776003e-6
3.239700845147908e-5
0.0001292596478008612
Timing:
x = 10.1
@btime wf($x) # 10.711 ns (0 allocations: 0 bytes)
1 Like
Very nice, indeed!
julia> @benchmark wf(height) setup=(height=Float64((6.0+rand()*500.0)))
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min β¦ max): 10.679 ns β¦ 27.772 ns β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 10.695 ns β GC (median): 0.00%
Time (mean Β± Ο): 10.767 ns Β± 0.938 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ
ββββ
β β
βββββββββββββββββ
ββββββ
ββββββββββββββββββββββββββββββββββββ β
10.7 ns Histogram: log(frequency) by time 10.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
wind_factor(x)=evalpoly(1/x, (1.5949457323488772, 1518.919208108258, 104930.09322893005, -211416.6534637618, 1.3390365545830491e6, -5.705301564190215e6, 1.0384434802323304e7))/evalpoly(1/x, (1.0, 1092.3211847159544, 91063.22360243625))
hereβs the absolute error: (and it runs in 3ns)
(found by using Remez; ratfn_minimax(x->wind_factor(1/x), (1/1000,1/6), 6, 2)
)
Note that this should probably assert that 6<x<1000
since it is off by .2
by the time x
gets to 3, and it goes totally wack further outside the range.
7 Likes
So the answer to my original question would be: Use the Remez algorithm and the package Remez.jl β¦
I love this community!
How did you get the idea to approximate 1/x ?