# How to approximate a function

I have the following function:

``````function wind_factor(height)
log1 = log(height / 0.0002) / log(6.0 / 0.0002)
exp1 = exp(0.08163 * log(height/6.0))
return log1 +  (log1 - exp1)
end
``````

The parameter height is in the range 6…1000 .

It is a bit slow for my purpose (about 23ns on my PC). Is there a way to increase the speed?

I do not need the full accuracy, and error of 1e-4 or 1e-5 would be good enough. Furthermore, when I call this function again and again the height is changing only by a small amount. Could this fact be used to speed up the calculation?

1 Like

yeah, this is way slower than it needs to be. The first obvious improvement is replacing division with multiplication.

``````function wind_factor(height)
log1 = log(height * 5000) / log(6.0 * 5000)
exp1 = exp(0.08163 * log(height*0.16666666666666666))
return log1 +  (log1 - exp1)
end
``````

Ok, down to 20 ns. Nice!

Next step is to do your math in `Float32` since that will make `exp` and `log` faster (and throw `@fastmath at the function with will also help).

Or just throw it at LoopVectorization:

``````julia> using LoopVectorization

julia> @btime map(wind_factor, v) setup=(v = rand(1000));
21.600 μs (1 allocation: 7.94 KiB)

julia> @btime vmap(wind_factor, v) setup=(v = rand(1000));
5.933 μs (1 allocation: 7.94 KiB)

julia> @btime vmapt(wind_factor, v) setup=(v = rand(1000));
1.450 μs (1 allocation: 7.94 KiB)
``````
1 Like

Well, it does not make it faster:

``````@benchmark wind_factor(height) setup=(height=Float32(6.0+rand()*500.0))
``````

This is 21 ns.

1 Like

You need to make the insides `Float32` also. `log(6.0 * 5000)` is a `Float64` etc which promotes everything else.

1 Like

You have float64 consts in the function, those need to be float32 too

1 Like

I cannot use LoopVectorization because I do not have a vector of values that I want to calculate. This function is called 10 time in the callback function of a solver.

1 Like
``````function wind_factor(height)
log1 = log(height * Float32(5000)) / log(Float32(6.0 * 5000))
exp1 = exp(Float32(0.08163) * log(height*Float32(0.16666666666666666)))
return log1 +  (log1 - exp1)
end
``````

This doesn’t help. I tested it before, log and exp are not faster for Float32 than for Float64. Not sure why, though…

next thing to change is that `log(x*5000)=log(x)+log(5000)`. This lets you only calculate `log(x)` once.

Thanks! Down to 17.5 ns now:

``````function wind_factor(height)
log_height = log(height)
log1 = (log_height + log(5000)) / log(6.0 * 5000)
exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
return log1 +  (log1 - exp1)
end
``````

Only one log and one exp function call left that are not constant.

Open question: Are there approximations for log and exp that are faster than the standard functions if you are happy with a reduced accuracy…

ChangePrecision.jl helps for quick experiments with precision:

``````julia> @changeprecision Float32 function wind_factor(height)
log_height = log(height)
log1 = (log_height + log(5000)) / log(6.0 * 5000)
exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
return log1 +  (log1 - exp1)
end
wind_factor (generic function with 1 method)

julia> @btime wind_factor(v) setup = (v = rand(Float32))
11.912 ns (0 allocations: 0 bytes)
0.6653509f0
``````
1 Like

Well, for me this code is slower than if I use Float64:

``````@changeprecision Float32 function wind_factor(height)
log_height = log(height)
log1 = (log_height + log(5000)) / log(6.0 * 5000)
exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
return log1 +  (log1 - exp1)
end

@benchmark wind_factor(height) setup=(height=Float32((6.0+rand()*994.0)))
``````

Please take into account that the height parameter is between 6 and 1000, that has an influence on the performance…

Over such a limited range of inputs (where your output only goes from 1 to 1.474), and where it sounds like you only need a few digits of accuracy, I would just do a minimax polynomial or rational function fit and then use `evalpoly`.

1 Like

From this approximation for the expoential function,

``````@inline function fastexp(x)
y = 1 + x / 1024
y *= y; y *= y; y *= y; y *= y; y *= y
y *= y; y *= y; y *= y; y *= y; y *= y
return y
end

function wf(height)
log5_000 = 8.517193191416238
invlog30_000 = 0.09700306451281168
log6inv = -1.791759469228055
log_height = log(height)
log1 = (log_height + log5_000) * invlog30_000
exp1 = fastexp(0.08163 * (log_height + log6inv))
return log1 + (log1 - exp1)
end
``````

Testing the accuracy:

``````for x in [0.1,1.0,2.0,100.0,1000.0]
println(wf(x) - wind_factor(x))
end

3.9054456273168725e-5
9.025009359731051e-6
3.590340520776003e-6
3.239700845147908e-5
0.0001292596478008612
``````

Timing:

``````x = 10.1
@btime wf(\$x)  # 10.711 ns (0 allocations: 0 bytes)
``````
1 Like

Very nice, indeed!

``````julia> @benchmark wf(height) setup=(height=Float64((6.0+rand()*500.0)))
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
Range (min … max):  10.679 ns … 27.772 ns  ┊ GC (min … max): 0.00% … 0.00%
Time  (median):     10.695 ns              ┊ GC (median):    0.00%
Time  (mean ± σ):   10.767 ns ±  0.938 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

▂▅███▅▃                                                     ▂
███████▆▃▁▁▃▄▇▇▇▅▆▇█▇▆▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▃▆▇█ █
10.7 ns      Histogram: log(frequency) by time      10.9 ns <

Memory estimate: 0 bytes, allocs estimate: 0.
``````
``````wind_factor(x)=evalpoly(1/x, (1.5949457323488772, 1518.919208108258, 104930.09322893005, -211416.6534637618, 1.3390365545830491e6, -5.705301564190215e6, 1.0384434802323304e7))/evalpoly(1/x, (1.0, 1092.3211847159544, 91063.22360243625))
``````

here’s the absolute error: (and it runs in 3ns) (found by `using Remez; ratfn_minimax(x->wind_factor(1/x), (1/1000,1/6), 6, 2)`)

Note that this should probably assert that `6<x<1000` since it is off by `.2` by the time `x` gets to 3, and it goes totally wack further outside the range.

7 Likes

So the answer to my original question would be: Use the Remez algorithm and the package Remez.jl …

I love this community! How did you get the idea to approximate 1/x ?