A problem about performance

My version is 1.6.7,

# global variable
v = rand(10000)

function fun1()
    s = 0.0
    for i in v::Vector{Float64}
        s += i
    end
end

function fun2(x::Vector{Float64})
    s = 0.0
    for i in x
        s += i
    end
end

I think fun2 has a better form, it should has better performance. But

# fun1
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  1.800 ns … 4.700 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     1.800 ns             ┊ GC (median):    0.00%
 Time  (mean ± σ):   1.850 ns ± 0.110 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

  █                           ▄                              
  █▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▂ ▂
  1.8 ns         Histogram: frequency by time          2 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

# fun2
BenchmarkTools.Trial: 10000 samples with 1000 evaluations.
 Range (min … max):  4.400 ns … 29.100 ns  ┊ GC (min … max): 0.00% … 0.00%
 Time  (median):     4.500 ns              ┊ GC (median):    0.00%
 Time  (mean ± σ):   4.500 ns ±  0.383 ns  ┊ GC (mean ± σ):  0.00% ± 0.00%

                              █                               
  ▆▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁█▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▁▃ ▂
  4.4 ns         Histogram: frequency by time         4.6 ns <

 Memory estimate: 0 bytes, allocs estimate: 0.

fun2 is slower than fun1.

========== Appendix ==========

> @code_llvm fun1()
;  @ In[11]:1 within `fun1'
; Function Attrs: uwtable
define void @julia_fun1_2323() #0 {
top:
;  @ In[11]:3 within `fun1'
  %0 = load atomic {}*, {}** inttoptr (i64 254897432 to {}**) unordered, align 8
  %1 = bitcast {}* %0 to i64*
  %2 = getelementptr inbounds i64, i64* %1, i64 -1
  %3 = load atomic i64, i64* %2 unordered, align 8
  %4 = and i64 %3, -16
  %5 = inttoptr i64 %4 to {}*
  %6 = icmp eq {}* %5, inttoptr (i64 1802763472 to {}*)
  br i1 %6, label %pass, label %fail

fail:                                             ; preds = %top
  call void @jl_type_error(i8* getelementptr inbounds ([11 x i8], [11 x i8]* @_j_str1, i64 0, i64 0), {}* inttoptr (i64 1802763472 to {}*), {}* %0)
  unreachable

pass:                                             ; preds = %top
;  @ In[11]:4 within `fun1'
  ret void
}

> @code_llvm fun2()
;  @ In[12]:1 within `fun2'
; Function Attrs: uwtable
define nonnull {}* @japi1_fun2_2411({}* %0, {}** %1, i32 %2) #0 {
top:
  %3 = alloca {}**, align 8
  store volatile {}** %1, {}*** %3, align 8
;  @ In[12]:4 within `fun2'
  ret {}* inttoptr (i64 1800907472 to {}*)
}

It’s too hard to read.

const v = rand(10000)

https://docs.julialang.org/en/v1/manual/performance-tips/#Avoid-untyped-global-variables

1 Like

That would be that answer if fun1 was slower, but in this case, it’s faster!

1 Like

it’s still the same problem and has a simple solution, don’t really care why doing X incorrectly this way is worse than the other way, just don’t do X

In this case doing the “bad thing” made his code faster… The question is why? And is he doing something wrong in the “good code”

1 Like

the bad thing is v being non constant global.

Right so why is that making fun1 faster?

1 Like

why would 99% of the user care? just don’t have non-typed global mutable thingy :man_shrugging:

it’s like “doctor it hurts when i do this” meme.

I mean it might be interesting for compiler devs, but this is a USAGE question

  1. I don’t see that time difference:
julia> @btime fun1()
  1.751 ns (0 allocations: 0 bytes)

julia> @btime fun2($v)
  1.530 ns (0 allocations: 0 bytes)

  1. The functions do not return anything, so they are actually not doing anything, I don´t believe we are measuring anything here.

  2. If we return s from the functions, we get:

julia> @btime fun1()
  8.692 μs (0 allocations: 0 bytes)
4986.779463427829

julia> @btime fun2($v)
  8.688 μs (0 allocations: 0 bytes)
4986.779463427829
7 Likes

Thank you. I thought there’s a problem inside fun2

Patient: Doctor jling when I eat bacon and eggs all day long my cholesterol goes down and my risk of heart attack decreases by all known measures.

Dr Jling: who cares why, just don’t do that and keep your risk much higher because it’s what they say to do in the sacred book…

I think you missed the point. He gets better results when he does the forbidden thing and wants to know why.

Sounds like it was more a measurement problem than anything else.

Thanks @lmiq

5 Likes

show me an example where eliminating non-const global variables hurts performance, because that’s what “risk much higher” is equivalent to in this analogy

Sure, how about the first post in this thread!

5 Likes

???

making const v = rand(10000) makes the performance worse?

No, passing the variable as an argument into a function barrier makes performance worse than the non const global.

I recommend against the bad practice (in your analogy, eating unhealthy food), you’re saying my recommendation may lead to worse results. (in your analogy, higher risk?)

clearly this is never the case for Julia, if you don’t have non-const globals, you can’t have any weird/slow code, to begin with.

I give up. The whole post is literally the OP asking why doing the wrong thing makes his code FASTER than doing the “right” thing.

4 Likes

It seem not the problem of function. but parameters are not passed correctly

the “wrong thing” in OP was not how the functions are written, it’s having non-const globals, DON’T HAVE NON-CONST GLOBAL.

all the rest are derived symptoms, I don’t care what f1 and f2 are, and how some bad f1 (related to non-const global) is faster than a different bad f2 (again related to non-const global), just don’t do non-const global.

:exploding_head:

So you’re saying results don’t matter, only ideology matters. The sacred book says so, and you shouldn’t care that doing the wrong thing reduces your runtime by a factor of 3…

2 Likes