I have three questions and I was curious if someone can provide some clarity on the results I’m seeing with a couple variations of a simple polynomial evaluation computation.
I wrote a Horner’s method implementation which is performing well, but I was curious about the for loop. Is there additional overhead by including a
length(X) in the range? Is it re-computed with each iteration, or is it only computed once? I am not seeing any noticeable difference with the benchmark between
horner2 below so I’m assuming it isn’t.
Also, I included the @btime values below in the code snippet and I’m seeing a huge improvement in the execution time between wrapping the poly call in a function compared to inlining the expression. Why exactly is that? It looks like there’s a huge benefit in wrapping most expressions in a function to maybe benefit from the optimizer or something?
Finally, by using @fastmath, I was able to just about halve the execution time of the inlined function. Where is @fastmath not appropriate to use? I figured in this case with a very simple computation I wouldn’t have to worry much about error creeping into the approximation, but I am not sure how this would fare if I were manipulating floats rather than integers.
Thanks in advance!
function horner(poly, x) result = poly for i in 1:length(poly)-1 result = result * x + poly[i+1] end return result end function horner2(poly, x) result = poly n = length(poly) for i in 1:n-1 result = result * x + poly[i+1] end return result end polywithpows(x) = 2x^4 - 2x^3 + 15x^2 + 12x - 6 polyexpanded(x) = 2*x*x*x*x - 2*x*x*x + 15*x*x + 12*x - 6 poly = [2, -2, 15, 12, -6]; x = 4; @btime horner(poly, x) #> 19.962 ns (1 allocation: 16 bytes) @btime horner2(poly, x) #> 19.597 ns (1 allocation: 16 bytes) @btime foldl((acc, el) -> acc*x + el, poly[2:end], init=poly) #> 2.161 μs (6 allocations: 208 bytes) @btime polywithpows(x) #> 18.328 ns (1 allocation: 16 bytes) @btime polyexpanded(x) #> 18.260 ns (1 allocation: 16 bytes) @btime @fastmath 2x^4 - 2x^3 + 15x^2 + 12x - 6 #> 88.610 ns (3 allocations: 48 bytes) @btime 2x^4 - 2x^3 + 15x^2 + 12x - 6 #> 142.256 ns (3 allocations: 48 bytes)