How to approximate a function

ufechner7 · May 5, 2022, 10:53pm

I have the following function:

function wind_factor(height)
    log1 = log(height / 0.0002) / log(6.0 / 0.0002)
    exp1 = exp(0.08163 * log(height/6.0))
    return log1 +  (log1 - exp1)
end

The parameter height is in the range 6…1000 .

It is a bit slow for my purpose (about 23ns on my PC). Is there a way to increase the speed?

I do not need the full accuracy, and error of 1e-4 or 1e-5 would be good enough. Furthermore, when I call this function again and again the height is changing only by a small amount. Could this fact be used to speed up the calculation?

Oscar_Smith · May 5, 2022, 11:00pm

yeah, this is way slower than it needs to be. The first obvious improvement is replacing division with multiplication.

ufechner7 · May 5, 2022, 11:09pm

function wind_factor(height)
    log1 = log(height * 5000) / log(6.0 * 5000)
    exp1 = exp(0.08163 * log(height*0.16666666666666666))
    return log1 +  (log1 - exp1)
end

Ok, down to 20 ns. Nice!

Oscar_Smith · May 5, 2022, 11:10pm

Next step is to do your math in Float32 since that will make exp and log faster (and throw `@fastmath at the function with will also help).

stillyslalom · May 5, 2022, 11:11pm

Or just throw it at LoopVectorization:

julia> using LoopVectorization

julia> @btime map(wind_factor, v) setup=(v = rand(1000));
  21.600 μs (1 allocation: 7.94 KiB)

julia> @btime vmap(wind_factor, v) setup=(v = rand(1000));
  5.933 μs (1 allocation: 7.94 KiB)

julia> @btime vmapt(wind_factor, v) setup=(v = rand(1000));
  1.450 μs (1 allocation: 7.94 KiB)

ufechner7 · May 5, 2022, 11:11pm

Well, it does not make it faster:

@benchmark wind_factor(height) setup=(height=Float32(6.0+rand()*500.0))

This is 21 ns.

Oscar_Smith · May 5, 2022, 11:12pm

You need to make the insides Float32 also. log(6.0 * 5000) is a Float64 etc which promotes everything else.

gbaraldi · May 5, 2022, 11:12pm

You have float64 consts in the function, those need to be float32 too

ufechner7 · May 5, 2022, 11:15pm

I cannot use LoopVectorization because I do not have a vector of values that I want to calculate. This function is called 10 time in the callback function of a solver.

ufechner7 · May 5, 2022, 11:18pm

function wind_factor(height)
    log1 = log(height * Float32(5000)) / log(Float32(6.0 * 5000))
    exp1 = exp(Float32(0.08163) * log(height*Float32(0.16666666666666666)))
    return log1 +  (log1 - exp1)
end

This doesn’t help. I tested it before, log and exp are not faster for Float32 than for Float64. Not sure why, though…

Oscar_Smith · May 5, 2022, 11:24pm

next thing to change is that log(x*5000)=log(x)+log(5000). This lets you only calculate log(x) once.

ufechner7 · May 5, 2022, 11:42pm

Thanks! Down to 17.5 ns now:

function wind_factor(height)
    log_height = log(height)
    log1 = (log_height + log(5000)) / log(6.0 * 5000)
    exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
    return log1 +  (log1 - exp1)
end

Only one log and one exp function call left that are not constant.

Open question: Are there approximations for log and exp that are faster than the standard functions if you are happy with a reduced accuracy…

Found this discussion: c# - Fast Exp calculation: possible to improve accuracy without losing too much performance? - Stack Overflow

stillyslalom · May 5, 2022, 11:50pm

ChangePrecision.jl helps for quick experiments with precision:

julia> @changeprecision Float32 function wind_factor(height)
           log_height = log(height)
           log1 = (log_height + log(5000)) / log(6.0 * 5000)
           exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
           return log1 +  (log1 - exp1)
       end
wind_factor (generic function with 1 method)

julia> @btime wind_factor(v) setup = (v = rand(Float32))
  11.912 ns (0 allocations: 0 bytes)
0.6653509f0

ufechner7 · May 5, 2022, 11:58pm

Well, for me this code is slower than if I use Float64:

@changeprecision Float32 function wind_factor(height)
    log_height = log(height)
    log1 = (log_height + log(5000)) / log(6.0 * 5000)
    exp1 = exp(0.08163 * (log_height + log(0.16666666666666666)))
    return log1 +  (log1 - exp1)
end

@benchmark wind_factor(height) setup=(height=Float32((6.0+rand()*994.0)))

Please take into account that the height parameter is between 6 and 1000, that has an influence on the performance…

stevengj · May 6, 2022, 12:28am

Over such a limited range of inputs (where your output only goes from 1 to 1.474), and where it sounds like you only need a few digits of accuracy, I would just do a minimax polynomial or rational function fit and then use evalpoly.

PeterSimon · May 6, 2022, 12:37am

From this approximation for the expoential function,

@inline function fastexp(x)
  y = 1 + x / 1024
  y *= y; y *= y; y *= y; y *= y; y *= y  
  y *= y; y *= y; y *= y; y *= y; y *= y 
  return y
end

function wf(height)
    log5_000 = 8.517193191416238
    invlog30_000 = 0.09700306451281168
    log6inv = -1.791759469228055
    log_height = log(height)
    log1 = (log_height + log5_000) * invlog30_000
    exp1 = fastexp(0.08163 * (log_height + log6inv))
    return log1 + (log1 - exp1)
end

Testing the accuracy:

for x in [0.1,1.0,2.0,100.0,1000.0]
    println(wf(x) - wind_factor(x))
end

3.9054456273168725e-5
9.025009359731051e-6
3.590340520776003e-6
3.239700845147908e-5
0.0001292596478008612

Timing:

x = 10.1
@btime wf($x)  # 10.711 ns (0 allocations: 0 bytes)

ufechner7 · May 6, 2022, 12:46am

Very nice, indeed!

julia> @benchmark wf(height) setup=(height=Float64((6.0+rand()*500.0)))
BenchmarkTools.Trial: 10000 samples with 999 evaluations.
 Range (min … max):  10.679 ns … 27.772 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     10.695 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   10.767 ns ±  0.938 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  ▂▅███▅▃                                                     ▂
  ███████▆▃▁▁▃▄▇▇▇▅▆▇█▇▆▅▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃▁▁▁▁▁▁▁▁▁▁▃▆▇█ █
  10.7 ns      Histogram: log(frequency) by time      10.9 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

Oscar_Smith · May 6, 2022, 12:56am

wind_factor(x)=evalpoly(1/x, (1.5949457323488772, 1518.919208108258, 104930.09322893005, -211416.6534637618, 1.3390365545830491e6, -5.705301564190215e6, 1.0384434802323304e7))/evalpoly(1/x, (1.0, 1092.3211847159544, 91063.22360243625))

here’s the absolute error: (and it runs in 3ns)

(found by using Remez; ratfn_minimax(x->wind_factor(1/x), (1/1000,1/6), 6, 2))

Note that this should probably assert that 6<x<1000 since it is off by .2 by the time x gets to 3, and it goes totally wack further outside the range.

ufechner7 · May 6, 2022, 1:05am

So the answer to my original question would be: Use the Remez algorithm and the package Remez.jl …

I love this community!

ufechner7 · May 6, 2022, 1:14am

How did you get the idea to approximate 1/x ?

Topic		Replies	Views
Is there a smarter way to evaluate an float power? Performance	27	1641	February 15, 2023
BetterExp 0.1.0 Package Announcements math	30	2554	September 9, 2020
A faster pow(x,1/12) function available in Fast12thRoot.jl Numerics	12	2495	June 3, 2019
Slow exp()? Numerics fast-math	12	2107	August 17, 2018
Speeding up evaluation of small exponential-like series Performance	19	374	May 17, 2024

How to approximate a function

Related topics