Multiplication performance

Joao_Barata · July 10, 2020, 6:09pm

Hey guys,

Here is a very simple code I’m using:

function tax_labor(τ0_y::Real,τ1_y::Real,ρ_τ::Real,barρ::Real,earn::Real,rent::Real)
    τ0_y*(earn - min(ρ_τ*rent,barρ))^(1.0-τ1_y)
end

If I do @btime, I get 9.6 ns. Now, if I do the following:

function tax_labor(τ0_y::Real,τ1_y::Real,ρ_τ::Real,barρ::Real,earn::Real,rent::Real)
    return τ0_y*(earn)^(1.0-τ1_y)
end

I get 1.20 ns. So apparently, doing 1 multiplication increases the computation time of this function 9 fold. Does this make sense to you? Am I doing something wrong? This might seem trivial, but I call this function a lot, so computation times are very important for the performance of the rest of the code. Thanks

tamasgal · July 10, 2020, 6:13pm

You also have a min() call in the first function which does a comparison and selection of the smaller one and another - call, so it’s not just “1 multiplication” in difference.

Mason · July 10, 2020, 6:22pm

Benchmarking is tricky to do right. It seems you might have benchmarked your functions with the inputs as compile-time constants and this gave you a misleading result. Here’s a more representative benchmark, (assuming none of the function arguments are known at compile time)

function tax_labor1(τ0_y::Real,τ1_y::Real,ρ_τ::Real,barρ::Real,earn::Real,rent::Real)
    τ0_y*(earn - min(ρ_τ*rent,barρ))^(1.0-τ1_y)
end

function tax_labor2(τ0_y::Real,τ1_y::Real,ρ_τ::Real,barρ::Real,earn::Real,rent::Real)
    return τ0_y*(earn)^(1.0-τ1_y)
end

let 
    a, b, c, d, e, f = Ref.(rand(6))
    
    @btime tax_labor1($a[], $b[], $c[], $d[], $e[], $f[])
    @btime tax_labor2($a[], $b[], $c[], $d[], $e[], $f[])
end;

#+RESULTS:
:   18.967 ns (0 allocations: 0 bytes)
:   17.465 ns (0 allocations: 0 bytes)

As you can see here, the difference in runtimes is proprotionally, not so great.

klaff · July 10, 2020, 6:25pm

Timings are also value dependent. I don’t get the difference you are getting between the functions but I just made up some numbers to plug in and I can see 10:1 timing differences depending on those values.

EDIT: I wasn’t clear. I’m not seeing a 10:1 difference between the functions, but I can make both go faster or slower depending on the arguments. I’m guessing that I’m triggering different paths in the exponentiation function.

Mason · July 10, 2020, 6:30pm

@Joao_Barata if you haven’t already, I’d recommend reading https://github.com/JuliaCI/BenchmarkTools.jl/blob/master/doc/manual.md#understanding-compiler-optimizations

tbeason · July 10, 2020, 6:30pm

Just want to point out that even if the timing difference is 10:1 here, the slow version still only takes 10ns (or likely 20ns as Mason showed). Even if you call this millions of times, I am 99% sure this is not the bottleneck in your code. If your code is slow, start with something where you can generate meaningful performance improvements first (reducing allocations, removing type instabilities, etc.).

Joao_Barata · July 10, 2020, 6:34pm

Ok, I confirm this is the case in my own code. Thanks. Can you tell me where I can read up on this? Like what is a compile time constant? What is going on under the hood here?

EDIT: saw your recommendation. Thanks

Joao_Barata · July 10, 2020, 6:36pm

Good point. In this case, we are talking about something that is called hundreds of millions of times. Possibly more. But still, I take your point.

tamasgal · July 10, 2020, 6:42pm

You can check out the docs of BenchmarkTools.jl, the ref-trick is mentioned in the “Quick Start” section in the readme: GitHub - JuliaCI/BenchmarkTools.jl: A benchmarking framework for the Julia language

Topic		Replies	Views
Apparent mismatch of run times during summation Performance question	5	373	February 2, 2022
Matrix multiplication Performance	6	538	November 17, 2020
Arithmetic performance of expression Performance	11	371	October 4, 2022
Optimize combination of element-wise products and matrix multiplication Performance	10	1013	July 15, 2020
Possible Performance Regression for Loops on 1.9? Performance	12	425	February 3, 2023

Multiplication performance

Related topics