I have the following function:

```
function wind_factor(height)
log1 = log(height / 0.0002) / log(6.0 / 0.0002)
exp1 = exp(0.08163 * log(height/6.0))
return log1 + (log1 - exp1)
end
```

The parameter height is in the range 6β¦1000 .

It is a bit slow for my purpose (about 23ns on my PC). Is there a way to increase the speed?

I do not need the full accuracy, and error of 1e-4 or 1e-5 would be good enough. Furthermore, when I call this function again and again the height is changing only by a small amount. Could this fact be used to speed up the calculation?

1 Like

yeah, this is way slower than it needs to be. The first obvious improvement is replacing division with multiplication.

```
function wind_factor(height)
log1 = log(height * 5000) / log(6.0 * 5000)
exp1 = exp(0.08163 * log(height*0.16666666666666666))
return log1 + (log1 - exp1)
end
```

Ok, down to 20 ns. Nice!

Next step is to do your math in `Float32`

since that will make `exp`

and `log`

faster (and throw `@fastmath at the function with will also help).

Or just throw it at LoopVectorization:

```
julia> using LoopVectorization
julia> @btime map(wind_factor, v) setup=(v = rand(1000));
21.600 ΞΌs (1 allocation: 7.94 KiB)
julia> @btime vmap(wind_factor, v) setup=(v = rand(1000));
5.933 ΞΌs (1 allocation: 7.94 KiB)
julia> @btime vmapt(wind_factor, v) setup=(v = rand(1000));
1.450 ΞΌs (1 allocation: 7.94 KiB)
```

1 Like

Well, it does not make it faster:

```
@benchmark wind_factor(height) setup=(height=Float32(6.0+rand()*500.0))
```

This is 21 ns.

1 Like

You need to make the insides `Float32`

also. `log(6.0 * 5000)`

is a `Float64`

etc which promotes everything else.

1 Like

You have float64 consts in the function, those need to be float32 too

1 Like

I cannot use LoopVectorization because I do not have a vector of values that I want to calculate. This function is called 10 time in the callback function of a solver.

1 Like

```
function wind_factor(height)
log1 = log(height * Float32(5000)) / log(Float32(6.0 * 5000))
exp1 = exp(Float32(0.08163) * log(height*Float32(0.16666666666666666)))
return log1 + (log1 - exp1)
end
```

This doesnβt help. I tested it before, log and exp are not faster for Float32 than for Float64. Not sure why, thoughβ¦

next thing to change is that `log(x*5000)=log(x)+log(5000)`

. This lets you only calculate `log(x)`

once.

Thanks! Down to 17.5 ns now:

```
function wind_factor(height)
log_height = log(height)
log1 = (log_height + log(5000)) / log(6.0 * 5000)
exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
return log1 + (log1 - exp1)
end
```

Only one log and one exp function call left that are not constant.

Open question: Are there approximations for log and exp that are faster than the standard functions if you are happy with a reduced accuracyβ¦

Found this discussion: c# - Fast Exp calculation: possible to improve accuracy without losing too much performance? - Stack Overflow

ChangePrecision.jl helps for quick experiments with precision:

```
julia> @changeprecision Float32 function wind_factor(height)
log_height = log(height)
log1 = (log_height + log(5000)) / log(6.0 * 5000)
exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
return log1 + (log1 - exp1)
end
wind_factor (generic function with 1 method)
julia> @btime wind_factor(v) setup = (v = rand(Float32))
11.912 ns (0 allocations: 0 bytes)
0.6653509f0
```

1 Like

Well, for me this code is slower than if I use Float64:

```
@changeprecision Float32 function wind_factor(height)
log_height = log(height)
log1 = (log_height + log(5000)) / log(6.0 * 5000)
exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
return log1 + (log1 - exp1)
end
@benchmark wind_factor(height) setup=(height=Float32((6.0+rand()*994.0)))
```

Please take into account that the height parameter is between 6 and 1000, that has an influence on the performanceβ¦

Over such a limited range of inputs (where your output only goes from 1 to 1.474), and where it sounds like you only need a few digits of accuracy, I would just do a minimax polynomial or rational function fit and then use `evalpoly`

.

1 Like

From this approximation for the expoential function,

```
@inline function fastexp(x)
y = 1 + x / 1024
y *= y; y *= y; y *= y; y *= y; y *= y
y *= y; y *= y; y *= y; y *= y; y *= y
return y
end
function wf(height)
log5_000 = 8.517193191416238
invlog30_000 = 0.09700306451281168
log6inv = -1.791759469228055
log_height = log(height)
log1 = (log_height + log5_000) * invlog30_000
exp1 = fastexp(0.08163 * (log_height + log6inv))
return log1 + (log1 - exp1)
end
```

Testing the accuracy:

```
for x in [0.1,1.0,2.0,100.0,1000.0]
println(wf(x) - wind_factor(x))
end
3.9054456273168725e-5
9.025009359731051e-6
3.590340520776003e-6
3.239700845147908e-5
0.0001292596478008612
```

Timing:

```
x = 10.1
@btime wf($x) # 10.711 ns (0 allocations: 0 bytes)
```

1 Like

Very nice, indeed!

```
julia> @benchmark wf(height) setup=(height=Float64((6.0+rand()*500.0)))
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min β¦ max): 10.679 ns β¦ 27.772 ns β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 10.695 ns β GC (median): 0.00%
Time (mean Β± Ο): 10.767 ns Β± 0.938 ns β GC (mean Β± Ο): 0.00% Β± 0.00%
ββ
ββββ
β β
βββββββββββββββββ
ββββββ
ββββββββββββββββββββββββββββββββββββ β
10.7 ns Histogram: log(frequency) by time 10.9 ns <
Memory estimate: 0 bytes, allocs estimate: 0.
```

```
wind_factor(x)=evalpoly(1/x, (1.5949457323488772, 1518.919208108258, 104930.09322893005, -211416.6534637618, 1.3390365545830491e6, -5.705301564190215e6, 1.0384434802323304e7))/evalpoly(1/x, (1.0, 1092.3211847159544, 91063.22360243625))
```

hereβs the absolute error: (and it runs in 3ns)

(found by `using Remez; ratfn_minimax(x->wind_factor(1/x), (1/1000,1/6), 6, 2)`

)

Note that this should probably assert that `6<x<1000`

since it is off by `.2`

by the time `x`

gets to 3, and it goes totally wack further outside the range.

7 Likes

So the answer to my original question would be: Use the Remez algorithm and the package Remez.jl β¦

I love this community!

How did you get the idea to approximate 1/x ?