GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language is good for timing functions that are quick. For functions in the second range, @time
should work fine.
1ns vs 1000ns matters a lot. @horner
is used extensively in the elementary functions (sin
, cos
, log
, etc) where one call may being something like 6, 8 or 14 ns. If polynomial evaluation was to take 1000nsā¦ wellā¦ bye bye calculating millions of logs (likelihoods with lots of data, ā¦).
So when using 256 bits, the numbers i get from @time e.g.
0.000018 seconds (32 allocations: 1.234 KB)
0.000010 seconds (33 allocations: 1.344 KB)
are not trustworthy?
The question still remains though:
What exactly makes :($ā¦)
faster?
The compiler knows the value of the coefficients and the number of coefficients when the function is compiled and can optimize accordingly.
and why does
macro horner02(x, p::Array)
ex = p[end]
for i = length(p)-1:-1:1
ex = ((p[i]) + x * ex)
end
ex
end
not work?
Because the value of the coefficients and the number of coefficients when the function is compiled is then not known, and the compiler and can not optimize accordingly.
Whatever optimize in detail means :-/
Where can I read about this optimization in particular?
Alsoā¦Sorry maybe it was a bit quick of a change in topic:
I mean the p::Array is not liked by macros ?
How come that when I run f(x) within the sheet it is only 37ns, but when I run it from the REPL it is like 1.5ns?
macro horner01(x, p...)
ex = p[end]
for i = length(p)-1:-1:1
ex = :($(p[i]) + $x * $ex)
end
ex
end
function horner02(x, p...)
ex = p[end]
for i = length(p)-1:-1:1
ex = ((p[i]) + x * ex)
end
ex
end
using BenchmarkTools
setprecision(10^8);
x=big(9528)/10000;
x=0.2
res1= @btime @horner01(x,4,3,2,1);
res2= @btime horner02(x,4,3,2,1);
res3= @btime Base.Math.@horner(x,4,3,2,1)
f=x -> Base.Math.@horner(x,4,3,2,1)
res4= @btime f(x)
y=0
I would recommend reading an introduction on macros and metaprogramming (in any language). Itās a different kind of programming than what most programmers are used to. The Julia manual is not really the place to start if youāve never encountered the topic before.
Ok. I just noticed that
g(x)=Base.Math.@horner(x,4,3,2,1)
x=0.2
g(x) takes 100ns
while
g(0.2) is 1.5ns.
If reading the variable from memory takes the most time anyway if we are talking about quick calculations in the ns range, then it shouldnt really matter if my metaprogrammed macro does 1.5 or my function 12ns, or?
Or is there a way to make reading quicker?
You mean, why res3= @btime Base.Math.@horner(x,4,3,2,1)
is slower than res4= @btime f(x)
?
If you look at the produced code
@code_native Base.Math.@horner(x,4,3,2,1)
@code_native f(x)
you see that the latter produces much less code.
([Edited, again]: Difference due to calling from function vs. global scope. Removed my completely wrong āexplanationā)
Hmā¦is @code_native essentially the same as @macroexpand?
How come an anonymous function f doesnt care if i write
f(x)
or
f(0.2)
in both cases I get 33ns.
However the function g(x) as defined above does care if I do
g(x)
or
g(0.2)
Benchmark in functions. See https://docs.julialang.org/en/stable/manual/performance-tips/
Please read the manual!!! I know many things not well either, but this is such a fundamental thing, you really have to read the manual. Or watch some JuliaCon YouTube videos. There was a nice one about about Julia Internals, or about generated functions.
[Edit 2: I try to explain but it may be inaccurate]. There are three possibilities:
- macro: takes an expression gives out a modified expression, doesnāt know about value or types
- @generated functions: is the same except does know the type
- function: takes an ast? and gives out an assembly
Here is a video link Quinn: ā¦From parse time to compile time
The piece that is missing here is āwhat does a macro doā.
Letās rewrite macro horner
from your first post as a function:
julia> function horner_expression_generator(x, p...)
ex = p[end]
for i = length(p)-1:-1:1
ex = :($(p[i]) + $x * $ex)
end
ex
end
horner_expression_generator (generic function with 1 method)
This is just a function, and we can call it like any other function:
julia> x = 0.01
julia> r = horner_expression_generator(x, 6,5,4,3,2,1)
:(6 + 0.01 * (5 + 0.01 * (4 + 0.01 * (3 + 0.01 * (2 + 0.01 * 1)))))
But notice it doesnāt return the actual value (6.0504030201), it returns something that starts with a frown-face :(
, as you noticed. What is this?
julia> typeof(r)
Expr
And what does it actually contain?
julia> dump(r, maxdepth=20)
Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol +
2: Int64 6
3: Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol *
2: Float64 0.01
3: Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol +
2: Int64 5
3: Expr
head: Symbol call
args: Array{Any}((3,))
1: Symbol *
2: Float64 0.01
3: Expr
typ: Any
typ: Any
typ: Any
typ: Any
[edit: adding maxdepth=20
argument to dump
will fully expand the expression tree! h/t this post]
The objects called Expr
and Symbol
and Float64
and Any
, and all the rest, are the building blocks of a Julia program. Julia usually hides these building blocks and just gives you the answer straight away. But when you write code inside of a :( ā¦ )
or quote ā¦ end
block, it means ādonāt run this code right away, turn it in to an Expr (expression) insteadā. The important point is that these building blocks can be created and modified in Julia code itself. Thatās why itās called metaprogramming, and thatās what a macro does: generate code.
Writing macro horner(ā¦) ā¦ end
you are creating the same thing as horner_expression_generator
above, but it gets a funny name: @horner
.
When Julia sees the code @horner(x, 6,5,4,3,2,1)
, then it effectively pastes the resulting code of horner_expression_generator(x, 6,5,4,3,2,1)
into place before any code runs at all (because we donāt know what x
will be).
Explaining why this can be optimized better than running the for
loop in-place is a really big topic, but try some longer polynomials or imagine that 1 vs 10 ns call being repeated thousands of times ā it will make a difference.
ps: all of this is in the Metaprogramming section in the manual Although Iām the first to admit itās not as easy to digest as weād like, please try walking through the examples there in order to hopefully develop some intuition for what is happening under the hood.
Are you on an older version of Julia? I think the macro version was introduced in 0.6. The old equivalent is:
macroexpand( :(@horner(x, 6,5,4,3,2,1) )
which, in the terminology of my last post, is the same thing as calling the function horner_expression_generator
directly.
Hey, Yes I was on 0.5.0 before, but now Iām on 0.6.1.
So thanks for you explanation, but looking at dump(r); Why do these building blocks stop at 5 and 0.01?
I would have expected further building blocks for 4,3,2,1 or?
Hey and thanks. So using globals is an issue concerning optimization apparentlyā¦
But why does this not apply to anonymous functions. As mentioned before
g(x)=Base.Math.@horner(x,4,3,2,1)
x=0.2
@btime g(x)
@btime g(0.2)
seems to make a difference in accordance with this global explanation in the link you gave.
But it does not happen using an anonymous function:
f=x->Base.Math.@horner(x,4,3,2,1)
x=0.2
@btime f(x)
@btime f(0.2)
though it is overall slower than g(0.2).
You insist on not reading any material on macros and metaprogramming, donāt you?
You are trying to compare apples to pears. Basically, in programming languages like Julia, your code gets preprocessed and then compiled. In languages, again like Julia, which support this metaprogramming functionality, you interfere with the preprocessor and tell it how to compile the code. This mechanism is called "macro"s in Julia, whereas it can have a different name in other languages (e.g., template metaprogramming in C++).
If you inspect the above code more carefully, you will realize that x = 0.01
is entered before the macro is written. Then, when you write the macro, the preprocessor processes the input (that is, the piece of code inside the macro, or, the piece of code that is returned as the Expr
thing up there) and generates some code with the known x = 0.01
value as well as the p...
things (there are 6 of them at the time of preprocessing). From this moment on, everything is statically known. This is where the macros can get helpful.
Think of the below example
julia> macro myfact(x)
ex = 1
while (x > 0)
ex = :($x * $ex)
x -= 1
end
ex
end
julia> @macroexpand(@myfact(15))
:(1 * (2 * (3 * (4 * (5 * (6 * (7 * (8 * (9 * (10 * (11 * (12 * (13 * (14 * (15 * 1)))))))))))))))
When you write @myfact(15)
, the macro gets compiled as above. Please note the lack of the while
loop. Basically, you generate this code once during preprocessing, and it will be used when needed. In a way, you precompute the factorial with the macro.
On the other hand, when you write a normal function f(x)
to calculate the same factorial, it needs to do the looping, unless you implement some sort of table-lookup for some predefined values.
When you were using your original example, your x
value was not predefined at the time of preprocessing. As far as I know, this is the main difference between the two: being statically (or compile-time) known vs deferring something to run-time (that is, after the compilation).
I hope I have not made any serious mistake regarding the terminology hereā¦
Good catch. They are there, but the default display is truncated. To get to the bottom of the pile, try:
dump(r.args[3].args[3].args[3].args[3].args[3].args[3])