Horner rule

metaprogramming

#21

https://github.com/JuliaCI/BenchmarkTools.jl is good for timing functions that are quick. For functions in the second range, @time should work fine.


#22

1ns vs 1000ns matters a lot. @horner is used extensively in the elementary functions (sin, cos, log, etc) where one call may being something like 6, 8 or 14 ns. If polynomial evaluation was to take 1000ns… well… bye bye calculating millions of logs (likelihoods with lots of data, …).


#23

So when using 256 bits, the numbers i get from @time e.g.
0.000018 seconds (32 allocations: 1.234 KB)
0.000010 seconds (33 allocations: 1.344 KB)
are not trustworthy?

The question still remains though:
What exactly makes :($…)
faster?


#24

The compiler knows the value of the coefficients and the number of coefficients when the function is compiled and can optimize accordingly.


#25

and why does

macro horner02(x, p::Array)
    ex = p[end]
    for i = length(p)-1:-1:1
        ex = ((p[i]) + x * ex)
    end
    ex
end

not work?


#26

Because the value of the coefficients and the number of coefficients when the function is compiled is then not known, and the compiler and can not optimize accordingly.


#27

Whatever optimize in detail means :-/
Where can I read about this optimization in particular?

Also…Sorry maybe it was a bit quick of a change in topic:
I mean the p::Array is not liked by macros ?


#28

How come that when I run f(x) within the sheet it is only 37ns, but when I run it from the REPL it is like 1.5ns?

macro horner01(x, p...)
    ex = p[end]
    for i = length(p)-1:-1:1
        ex = :($(p[i]) + $x * $ex)
    end
    ex
end

function horner02(x, p...)
    ex = p[end]
    for i = length(p)-1:-1:1
        ex = ((p[i]) + x * ex)
    end
    ex
end

using BenchmarkTools
setprecision(10^8);
x=big(9528)/10000;
x=0.2

res1= @btime @horner01(x,4,3,2,1);
res2= @btime horner02(x,4,3,2,1);
res3= @btime Base.Math.@horner(x,4,3,2,1)

f=x -> Base.Math.@horner(x,4,3,2,1)
res4= @btime f(x)

y=0

#29

I would recommend reading an introduction on macros and metaprogramming (in any language). It’s a different kind of programming than what most programmers are used to. The Julia manual is not really the place to start if you’ve never encountered the topic before.


#30

Ok. I just noticed that

g(x)=Base.Math.@horner(x,4,3,2,1)
x=0.2

g(x) takes 100ns

while

g(0.2) is 1.5ns.

If reading the variable from memory takes the most time anyway if we are talking about quick calculations in the ns range, then it shouldnt really matter if my metaprogrammed macro does 1.5 or my function 12ns, or?

Or is there a way to make reading quicker?


#31

You mean, why res3= @btime Base.Math.@horner(x,4,3,2,1) is slower than res4= @btime f(x)?

If you look at the produced code

@code_native Base.Math.@horner(x,4,3,2,1)
@code_native f(x)

you see that the latter produces much less code.

([Edited, again]: Difference due to calling from function vs. global scope. Removed my completely wrong ‘explanation’)


#32

Hm…is @code_native essentially the same as @macroexpand?

How come an anonymous function f doesnt care if i write
f(x)
or
f(0.2)
in both cases I get 33ns.

However the function g(x) as defined above does care if I do
g(x)
or
g(0.2)


#33

Benchmark in functions. See https://docs.julialang.org/en/stable/manual/performance-tips/


#34

Please read the manual!!! I know many things not well either, but this is such a fundamental thing, you really have to read the manual. Or watch some JuliaCon YouTube videos. There was a nice one about about Julia Internals, or about generated functions.

[Edit 2: I try to explain but it may be inaccurate]. There are three possibilities:

  • macro: takes an expression gives out a modified expression, doesn’t know about value or types
  • @generated functions: is the same except does know the type
  • function: takes an ast? and gives out an assembly

Here is a video link Quinn: …From parse time to compile time


#35

The piece that is missing here is “what does a macro do”.

Let’s rewrite macro horner from your first post as a function:

julia> function horner_expression_generator(x, p...)
            ex = p[end]
            for i = length(p)-1:-1:1
                ex = :($(p[i]) + $x * $ex)
            end
            ex
        end
horner_expression_generator (generic function with 1 method)

This is just a function, and we can call it like any other function:

julia> x = 0.01
julia> r = horner_expression_generator(x, 6,5,4,3,2,1)
:(6 + 0.01 * (5 + 0.01 * (4 + 0.01 * (3 + 0.01 * (2 + 0.01 * 1)))))

But notice it doesn’t return the actual value (6.0504030201), it returns something that starts with a frown-face :(, as you noticed. What is this?

julia> typeof(r)
Expr

And what does it actually contain?

julia> dump(r, maxdepth=20)
Expr
  head: Symbol call
  args: Array{Any}((3,))
    1: Symbol +
    2: Int64 6
    3: Expr
      head: Symbol call
      args: Array{Any}((3,))
        1: Symbol *
        2: Float64 0.01
        3: Expr
          head: Symbol call
          args: Array{Any}((3,))
            1: Symbol +
            2: Int64 5
            3: Expr
              head: Symbol call
              args: Array{Any}((3,))
                1: Symbol *
                2: Float64 0.01
                3: Expr
              typ: Any
          typ: Any
      typ: Any
  typ: Any

[edit: adding maxdepth=20 argument to dump will fully expand the expression tree! h/t this post]

The objects called Expr and Symbol and Float64 and Any, and all the rest, are the building blocks of a Julia program. Julia usually hides these building blocks and just gives you the answer straight away. But when you write code inside of a :( … ) or quote … end block, it means “don’t run this code right away, turn it in to an Expr (expression) instead”. The important point is that these building blocks can be created and modified in Julia code itself. That’s why it’s called metaprogramming, and that’s what a macro does: generate code.

Writing macro horner(…) … end you are creating the same thing as horner_expression_generator above, but it gets a funny name: @horner.

When Julia sees the code @horner(x, 6,5,4,3,2,1), then it effectively pastes the resulting code of horner_expression_generator(x, 6,5,4,3,2,1) into place before any code runs at all (because we don’t know what x will be).

Explaining why this can be optimized better than running the for loop in-place is a really big topic, but try some longer polynomials or imagine that 1 vs 10 ns call being repeated thousands of times – it will make a difference.

ps: all of this is in the Metaprogramming section in the manual :wink: Although I’m the first to admit it’s not as easy to digest as we’d like, please try walking through the examples there in order to hopefully develop some intuition for what is happening under the hood.


#36

Are you on an older version of Julia? I think the macro version was introduced in 0.6. The old equivalent is:

macroexpand( :(@horner(x, 6,5,4,3,2,1) )

which, in the terminology of my last post, is the same thing as calling the function horner_expression_generator directly.


#37

Hey, Yes I was on 0.5.0 before, but now I’m on 0.6.1.
So thanks for you explanation, but looking at dump®; Why do these building blocks stop at 5 and 0.01?
I would have expected further building blocks for 4,3,2,1 or?


#38

Hey and thanks. So using globals is an issue concerning optimization apparently…
But why does this not apply to anonymous functions. As mentioned before

g(x)=Base.Math.@horner(x,4,3,2,1)
x=0.2
@btime g(x)
@btime g(0.2)

seems to make a difference in accordance with this global explanation in the link you gave.
But it does not happen using an anonymous function:

f=x->Base.Math.@horner(x,4,3,2,1)
x=0.2
@btime f(x)
@btime f(0.2)

though it is overall slower than g(0.2).


#39

You insist on not reading any material on macros and metaprogramming, don’t you?

You are trying to compare apples to pears. Basically, in programming languages like Julia, your code gets preprocessed and then compiled. In languages, again like Julia, which support this metaprogramming functionality, you interfere with the preprocessor and tell it how to compile the code. This mechanism is called "macro"s in Julia, whereas it can have a different name in other languages (e.g., template metaprogramming in C++).

If you inspect the above code more carefully, you will realize that x = 0.01 is entered before the macro is written. Then, when you write the macro, the preprocessor processes the input (that is, the piece of code inside the macro, or, the piece of code that is returned as the Expr thing up there) and generates some code with the known x = 0.01 value as well as the p... things (there are 6 of them at the time of preprocessing). From this moment on, everything is statically known. This is where the macros can get helpful.

Think of the below example

julia> macro myfact(x)
         ex = 1
         while (x > 0)
           ex = :($x * $ex)
           x -= 1
         end
         ex
       end
julia> @macroexpand(@myfact(15))
:(1 * (2 * (3 * (4 * (5 * (6 * (7 * (8 * (9 * (10 * (11 * (12 * (13 * (14 * (15 * 1)))))))))))))))

When you write @myfact(15), the macro gets compiled as above. Please note the lack of the while loop. Basically, you generate this code once during preprocessing, and it will be used when needed. In a way, you precompute the factorial with the macro.

On the other hand, when you write a normal function f(x) to calculate the same factorial, it needs to do the looping, unless you implement some sort of table-lookup for some predefined values.

When you were using your original example, your x value was not predefined at the time of preprocessing. As far as I know, this is the main difference between the two: being statically (or compile-time) known vs deferring something to run-time (that is, after the compilation).

I hope I have not made any serious mistake regarding the terminology here…


#40

Good catch. They are there, but the default display is truncated. To get to the bottom of the pile, try:

dump(r.args[3].args[3].args[3].args[3].args[3].args[3])