How to approximate a function

I have the following function:

function wind_factor(height)
    log1 = log(height / 0.0002) / log(6.0 / 0.0002)
    exp1 = exp(0.08163 * log(height/6.0))
    return log1 +  (log1 - exp1)
end

The parameter height is in the range 6…1000 .

It is a bit slow for my purpose (about 23ns on my PC). Is there a way to increase the speed?

I do not need the full accuracy, and error of 1e-4 or 1e-5 would be good enough. Furthermore, when I call this function again and again the height is changing only by a small amount. Could this fact be used to speed up the calculation?

1 Like

yeah, this is way slower than it needs to be. The first obvious improvement is replacing division with multiplication.

function wind_factor(height)
    log1 = log(height * 5000) / log(6.0 * 5000)
    exp1 = exp(0.08163 * log(height*0.16666666666666666))
    return log1 +  (log1 - exp1)
end

Ok, down to 20 ns. Nice!

Next step is to do your math in Float32 since that will make exp and log faster (and throw `@fastmath at the function with will also help).

Or just throw it at LoopVectorization:

julia> using LoopVectorization

julia> @btime map(wind_factor, v) setup=(v = rand(1000));
  21.600 ΞΌs (1 allocation: 7.94 KiB)

julia> @btime vmap(wind_factor, v) setup=(v = rand(1000));
  5.933 ΞΌs (1 allocation: 7.94 KiB)

julia> @btime vmapt(wind_factor, v) setup=(v = rand(1000));
  1.450 ΞΌs (1 allocation: 7.94 KiB)
1 Like

Well, it does not make it faster:

@benchmark wind_factor(height) setup=(height=Float32(6.0+rand()*500.0))

This is 21 ns.

1 Like

You need to make the insides Float32 also. log(6.0 * 5000) is a Float64 etc which promotes everything else.

1 Like

You have float64 consts in the function, those need to be float32 too

1 Like

I cannot use LoopVectorization because I do not have a vector of values that I want to calculate. This function is called 10 time in the callback function of a solver.

1 Like
function wind_factor(height)
    log1 = log(height * Float32(5000)) / log(Float32(6.0 * 5000))
    exp1 = exp(Float32(0.08163) * log(height*Float32(0.16666666666666666)))
    return log1 +  (log1 - exp1)
end

This doesn’t help. I tested it before, log and exp are not faster for Float32 than for Float64. Not sure why, though…

next thing to change is that log(x*5000)=log(x)+log(5000). This lets you only calculate log(x) once.

Thanks! Down to 17.5 ns now:

function wind_factor(height)
    log_height = log(height)
    log1 = (log_height + log(5000)) / log(6.0 * 5000)
    exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
    return log1 +  (log1 - exp1)
end

Only one log and one exp function call left that are not constant.

Open question: Are there approximations for log and exp that are faster than the standard functions if you are happy with a reduced accuracy…

Found this discussion: c# - Fast Exp calculation: possible to improve accuracy without losing too much performance? - Stack Overflow

ChangePrecision.jl helps for quick experiments with precision:

julia> @changeprecision Float32 function wind_factor(height)
           log_height = log(height)
           log1 = (log_height + log(5000)) / log(6.0 * 5000)
           exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
           return log1 +  (log1 - exp1)
       end
wind_factor (generic function with 1 method)

julia> @btime wind_factor(v) setup = (v = rand(Float32))
  11.912 ns (0 allocations: 0 bytes)
0.6653509f0
1 Like

Well, for me this code is slower than if I use Float64:

@changeprecision Float32 function wind_factor(height)
    log_height = log(height)
    log1 = (log_height + log(5000)) / log(6.0 * 5000)
    exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
    return log1 +  (log1 - exp1)
end

@benchmark wind_factor(height) setup=(height=Float32((6.0+rand()*994.0)))

Please take into account that the height parameter is between 6 and 1000, that has an influence on the performance…

Over such a limited range of inputs (where your output only goes from 1 to 1.474), and where it sounds like you only need a few digits of accuracy, I would just do a minimax polynomial or rational function fit and then use evalpoly.

1 Like

From this approximation for the expoential function,

@inline function fastexp(x)
  y = 1 + x / 1024
  y *= y; y *= y; y *= y; y *= y; y *= y  
  y *= y; y *= y; y *= y; y *= y; y *= y 
  return y
end

function wf(height)
    log5_000 = 8.517193191416238
    invlog30_000 = 0.09700306451281168
    log6inv = -1.791759469228055
    log_height = log(height)
    log1 = (log_height + log5_000) * invlog30_000
    exp1 = fastexp(0.08163 * (log_height + log6inv))
    return log1 + (log1 - exp1)
end

Testing the accuracy:

for x in [0.1,1.0,2.0,100.0,1000.0]
    println(wf(x) - wind_factor(x))
end

3.9054456273168725e-5
9.025009359731051e-6
3.590340520776003e-6
3.239700845147908e-5
0.0001292596478008612

Timing:

x = 10.1
@btime wf($x)  # 10.711 ns (0 allocations: 0 bytes)
1 Like

Very nice, indeed!

julia> @benchmark wf(height) setup=(height=Float64((6.0+rand()*500.0)))
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  10.679 ns … 27.772 ns  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     10.695 ns              β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   10.767 ns Β±  0.938 ns  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

  β–‚β–…β–ˆβ–ˆβ–ˆβ–…β–ƒ                                                     β–‚
  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–†β–ƒβ–β–β–ƒβ–„β–‡β–‡β–‡β–…β–†β–‡β–ˆβ–‡β–†β–…β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–β–ƒβ–β–β–β–β–β–β–β–β–β–β–ƒβ–†β–‡β–ˆ β–ˆ
  10.7 ns      Histogram: log(frequency) by time      10.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.
wind_factor(x)=evalpoly(1/x, (1.5949457323488772, 1518.919208108258, 104930.09322893005, -211416.6534637618, 1.3390365545830491e6, -5.705301564190215e6, 1.0384434802323304e7))/evalpoly(1/x, (1.0, 1092.3211847159544, 91063.22360243625))

here’s the absolute error: (and it runs in 3ns)
image

(found by using Remez; ratfn_minimax(x->wind_factor(1/x), (1/1000,1/6), 6, 2))

Note that this should probably assert that 6<x<1000 since it is off by .2 by the time x gets to 3, and it goes totally wack further outside the range.

7 Likes

So the answer to my original question would be: Use the Remez algorithm and the package Remez.jl …

I love this community! :heart:

How did you get the idea to approximate 1/x ?