How to approximate a function

I have the following function:

function wind_factor(height)
    log1 = log(height / 0.0002) / log(6.0 / 0.0002)
    exp1 = exp(0.08163 * log(height/6.0))
    return log1 +  (log1 - exp1)
end

The parameter height is in the range 6…1000 .

It is a bit slow for my purpose (about 23ns on my PC). Is there a way to increase the speed?

I do not need the full accuracy, and error of 1e-4 or 1e-5 would be good enough. Furthermore, when I call this function again and again the height is changing only by a small amount. Could this fact be used to speed up the calculation?

yeah, this is way slower than it needs to be. The first obvious improvement is replacing division with multiplication.

function wind_factor(height)
    log1 = log(height * 5000) / log(6.0 * 5000)
    exp1 = exp(0.08163 * log(height*0.16666666666666666))
    return log1 +  (log1 - exp1)
end

Ok, down to 20 ns. Nice!

Next step is to do your math in Float32 since that will make exp and log faster (and throw `@fastmath at the function with will also help).

Or just throw it at LoopVectorization:

julia> using LoopVectorization

julia> @btime map(wind_factor, v) setup=(v = rand(1000));
  21.600 μs (1 allocation: 7.94 KiB)

julia> @btime vmap(wind_factor, v) setup=(v = rand(1000));
  5.933 μs (1 allocation: 7.94 KiB)

julia> @btime vmapt(wind_factor, v) setup=(v = rand(1000));
  1.450 μs (1 allocation: 7.94 KiB)

Well, it does not make it faster:

@benchmark wind_factor(height) setup=(height=Float32(6.0+rand()*500.0))

This is 21 ns.

You need to make the insides Float32 also. log(6.0 * 5000) is a Float64 etc which promotes everything else.

You have float64 consts in the function, those need to be float32 too

I cannot use LoopVectorization because I do not have a vector of values that I want to calculate. This function is called 10 time in the callback function of a solver.

function wind_factor(height)
    log1 = log(height * Float32(5000)) / log(Float32(6.0 * 5000))
    exp1 = exp(Float32(0.08163) * log(height*Float32(0.16666666666666666)))
    return log1 +  (log1 - exp1)
end

This doesn’t help. I tested it before, log and exp are not faster for Float32 than for Float64. Not sure why, though…

next thing to change is that log(x*5000)=log(x)+log(5000). This lets you only calculate log(x) once.

Thanks! Down to 17.5 ns now:

function wind_factor(height)
    log_height = log(height)
    log1 = (log_height + log(5000)) / log(6.0 * 5000)
    exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
    return log1 +  (log1 - exp1)
end

Only one log and one exp function call left that are not constant.

Open question: Are there approximations for log and exp that are faster than the standard functions if you are happy with a reduced accuracy…

Found this discussion: c# - Fast Exp calculation: possible to improve accuracy without losing too much performance? - Stack Overflow

ChangePrecision.jl helps for quick experiments with precision:

julia> @changeprecision Float32 function wind_factor(height)
           log_height = log(height)
           log1 = (log_height + log(5000)) / log(6.0 * 5000)
           exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
           return log1 +  (log1 - exp1)
       end
wind_factor (generic function with 1 method)

julia> @btime wind_factor(v) setup = (v = rand(Float32))
  11.912 ns (0 allocations: 0 bytes)
0.6653509f0

Well, for me this code is slower than if I use Float64:

@changeprecision Float32 function wind_factor(height)
    log_height = log(height)
    log1 = (log_height + log(5000)) / log(6.0 * 5000)
    exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
    return log1 +  (log1 - exp1)
end

@benchmark wind_factor(height) setup=(height=Float32((6.0+rand()*994.0)))

Please take into account that the height parameter is between 6 and 1000, that has an influence on the performance…

Over such a limited range of inputs (where your output only goes from 1 to 1.474), and where it sounds like you only need a few digits of accuracy, I would just do a minimax polynomial or rational function fit and then use evalpoly.

From this approximation for the expoential function,

@inline function fastexp(x)
  y = 1 + x / 1024
  y *= y; y *= y; y *= y; y *= y; y *= y  
  y *= y; y *= y; y *= y; y *= y; y *= y 
  return y
end

function wf(height)
    log5_000 = 8.517193191416238
    invlog30_000 = 0.09700306451281168
    log6inv = -1.791759469228055
    log_height = log(height)
    log1 = (log_height + log5_000) * invlog30_000
    exp1 = fastexp(0.08163 * (log_height + log6inv))
    return log1 + (log1 - exp1)
end

Testing the accuracy:

for x in [0.1,1.0,2.0,100.0,1000.0]
    println(wf(x) - wind_factor(x))
end

3.9054456273168725e-5
9.025009359731051e-6
3.590340520776003e-6
3.239700845147908e-5
0.0001292596478008612

Timing:

x = 10.1
@btime wf($x)  # 10.711 ns (0 allocations: 0 bytes)

Very nice, indeed!

julia> @benchmark wf(height) setup=(height=Float64((6.0+rand()*500.0)))
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  10.679 ns … 27.772 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.695 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.767 ns ±  0.938 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▅███▅▃                                                     ▂
  ███████▆▃▁▁▃▄▇▇▇▅▆▇█▇▆▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▃▆▇█ █
  10.7 ns      Histogram: log(frequency) by time      10.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
wind_factor(x)=evalpoly(1/x, (1.5949457323488772, 1518.919208108258, 104930.09322893005, -211416.6534637618, 1.3390365545830491e6, -5.705301564190215e6, 1.0384434802323304e7))/evalpoly(1/x, (1.0, 1092.3211847159544, 91063.22360243625))

here’s the absolute error: (and it runs in 3ns)
image

(found by using Remez; ratfn_minimax(x->wind_factor(1/x), (1/1000,1/6), 6, 2))

Note that this should probably assert that 6<x<1000 since it is off by .2 by the time x gets to 3, and it goes totally wack further outside the range.

So the answer to my original question would be: Use the Remez algorithm and the package Remez.jl …

I love this community! :heart:

How did you get the idea to approximate 1/x ?