A problem about performance

XuJingye2022 · October 3, 2022, 7:20pm

My version is 1.6.7,

# global variable
v = rand(10000)

function fun1()
    s = 0.0
    for i in v::Vector{Float64}
        s += i
    end
end

function fun2(x::Vector{Float64})
    s = 0.0
    for i in x
        s += i
    end
end

I think fun2 has a better form, it should has better performance. But

# fun1
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.800 ns … 4.700 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.800 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.850 ns ± 0.110 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                           ▄                              
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▂
  1.8 ns         Histogram: frequency by time          2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

# fun2
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  4.400 ns … 29.100 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.500 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.500 ns ±  0.383 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                              █                               
  ▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▂
  4.4 ns         Histogram: frequency by time         4.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

fun2 is slower than fun1.

========== Appendix ==========

> @code_llvm fun1()
;  @ In[11]:1 within `fun1'
; Function Attrs: uwtable
define void @julia_fun1_2323() #0 {
top:
;  @ In[11]:3 within `fun1'
  %0 = load atomic {}*, {}** inttoptr (i64 254897432 to {}**) unordered, align 8
  %1 = bitcast {}* %0 to i64*
  %2 = getelementptr inbounds i64, i64* %1, i64 -1
  %3 = load atomic i64, i64* %2 unordered, align 8
  %4 = and i64 %3, -16
  %5 = inttoptr i64 %4 to {}*
  %6 = icmp eq {}* %5, inttoptr (i64 1802763472 to {}*)
  br i1 %6, label %pass, label %fail

fail:                                             ; preds = %top
  call void @jl_type_error(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 1802763472 to {}*), {}* %0)
  unreachable

pass:                                             ; preds = %top
;  @ In[11]:4 within `fun1'
  ret void
}

> @code_llvm fun2()
;  @ In[12]:1 within `fun2'
; Function Attrs: uwtable
define nonnull {}* @japi1_fun2_2411({}* %0, {}** %1, i32 %2) #0 {
top:
  %3 = alloca {}**, align 8
  store volatile {}** %1, {}*** %3, align 8
;  @ In[12]:4 within `fun2'
  ret {}* inttoptr (i64 1800907472 to {}*)
}

It’s too hard to read.

jling · October 3, 2022, 7:21pm

const v = rand(10000)

https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-untyped-global-variables

dlakelan · October 3, 2022, 7:21pm

That would be that answer if fun1 was slower, but in this case, it’s faster!

jling · October 3, 2022, 7:23pm

it’s still the same problem and has a simple solution, don’t really care why doing X incorrectly this way is worse than the other way, just don’t do X

dlakelan · October 3, 2022, 7:23pm

In this case doing the “bad thing” made his code faster… The question is why? And is he doing something wrong in the “good code”

jling · October 3, 2022, 7:24pm

the bad thing is v being non constant global.

dlakelan · October 3, 2022, 7:25pm

Right so why is that making fun1 faster?

jling · October 3, 2022, 7:26pm

why would 99% of the user care? just don’t have non-typed global mutable thingy

it’s like “doctor it hurts when i do this” meme.

I mean it might be interesting for compiler devs, but this is a USAGE question

lmiq · October 3, 2022, 7:28pm

I don’t see that time difference:

julia> @btime fun1()
  1.751 ns (0 allocations: 0 bytes)

julia> @btime fun2($v)
  1.530 ns (0 allocations: 0 bytes)

The functions do not return anything, so they are actually not doing anything, I don´t believe we are measuring anything here.
If we return s from the functions, we get:

julia> @btime fun1()
  8.692 μs (0 allocations: 0 bytes)
4986.779463427829

julia> @btime fun2($v)
  8.688 μs (0 allocations: 0 bytes)
4986.779463427829

XuJingye2022 · October 3, 2022, 7:28pm

Thank you. I thought there’s a problem inside fun2…

dlakelan · October 3, 2022, 7:33pm

Patient: Doctor jling when I eat bacon and eggs all day long my cholesterol goes down and my risk of heart attack decreases by all known measures.

Dr Jling: who cares why, just don’t do that and keep your risk much higher because it’s what they say to do in the sacred book…

I think you missed the point. He gets better results when he does the forbidden thing and wants to know why.

Sounds like it was more a measurement problem than anything else.

Thanks @lmiq

jling · October 3, 2022, 7:34pm

show me an example where eliminating non-const global variables hurts performance, because that’s what “risk much higher” is equivalent to in this analogy

dlakelan · October 3, 2022, 7:36pm

Sure, how about the first post in this thread!

jling · October 3, 2022, 7:36pm

???

making const v = rand(10000) makes the performance worse?

dlakelan · October 3, 2022, 7:37pm

No, passing the variable as an argument into a function barrier makes performance worse than the non const global.

jling · October 3, 2022, 7:39pm

I recommend against the bad practice (in your analogy, eating unhealthy food), you’re saying my recommendation may lead to worse results. (in your analogy, higher risk?)

clearly this is never the case for Julia, if you don’t have non-const globals, you can’t have any weird/slow code, to begin with.

dlakelan · October 3, 2022, 7:40pm

I give up. The whole post is literally the OP asking why doing the wrong thing makes his code FASTER than doing the “right” thing.

XuJingye2022 · October 3, 2022, 7:42pm

It seem not the problem of function. but parameters are not passed correctly

jling · October 3, 2022, 7:42pm

the “wrong thing” in OP was not how the functions are written, it’s having non-const globals, DON’T HAVE NON-CONST GLOBAL.

all the rest are derived symptoms, I don’t care what f1 and f2 are, and how some bad f1 (related to non-const global) is faster than a different bad f2 (again related to non-const global), just don’t do non-const global.

dlakelan · October 3, 2022, 7:49pm

So you’re saying results don’t matter, only ideology matters. The sacred book says so, and you shouldn’t care that doing the wrong thing reduces your runtime by a factor of 3…

Topic		Replies	Views
Why do global variables impact performance so badly? Performance	11	1898	September 9, 2020
Function calls in global scope, benchmarking, etc General Usage question	6	2591	August 14, 2019
More effective function parameter? Performance question	9	529	June 19, 2021
Passing Function As Object VS Creating New Function Performance	5	419	August 26, 2021
Optimising function with const instead of struct? Performance	9	777	July 7, 2018

A problem about performance

Related topics