Boltzmann Factor in a loop

gideonsimpson · March 31, 2018, 2:08pm

I was looking at the performance of a Monte Carlo method with @cortner and he observed there was some unanticipated behavior in the computation of the Boltzmann factor, e^{-\beta V(x)}, within the loop. The following is a simple example that shows the discrepancy:

Boltzmann_likelihood(x, V, beta) = exp(- beta * V(x))

function runN(x, V, beta, N)

    for n = 1:N
        # Boltzmann_likelihood(x, V, beta)
        exp(- beta * V(x)) 
    end
end

V(x) = (dot(x,x) - 1)^2
X0 = rand(2)
beta = 1.0

@time runN(X0, V, beta, 10_000_000)

If we run this code, without calling the predefined Boltzmann_likelihood function, the timing result (on my laptop) is, on a first run,:

  0.296474 seconds (17.07 k allocations: 977.591 KiB)

and

  0.292992 seconds (1.54 k allocations: 82.642 KiB)

on subsequent runs.

If I now rely on the Boltzmann_likelihood function (by commenting/uncommenting), I get on the first run

  1.306022 seconds (40.00 M allocations: 610.438 MiB, 17.79% gc time)

and

  1.175225 seconds (40.00 M allocations: 610.437 MiB, 1.31% gc time)

on subsequent runs.

What explains the discrepancy in memory usage? Can it be improved?

jebej · March 31, 2018, 2:16pm

I tried exactly the code posted and do not get such a difference:

julia> @time runN(X0, V, beta, 10_000_000)
  0.388611 seconds (16.97 k allocations: 959.374 KiB)

julia> @time runN(X0, V, beta, 10_000_000)
  0.355196 seconds (4 allocations: 160 bytes)

and after uncommenting

julia> @time runN(X0, V, beta, 10_000_000)
  0.713613 seconds (1.74 k allocations: 94.246 KiB)

julia> @time runN(X0, V, beta, 10_000_000)
  0.707212 seconds (4 allocations: 160 bytes)

Twice as long, as you would expect. What julia version and platform are you on?

jebej · March 31, 2018, 2:20pm

Oh I see that you commented the other line, and I can reproduce the issue. This is probably because in the first version, the exp(- beta * V(x)) computation is done at compile time, and not in the other case.

gideonsimpson · March 31, 2018, 2:47pm

Yes, I comparing the case:

# Boltzmann_likelihood(x, V, beta)
exp(- beta * V(x))

to

Boltzmann_likelihood(x, V, beta)
# exp(- beta * V(x))

In the code. I am working with Julia 0.6.2.

Ralph_Smith · March 31, 2018, 2:55pm

On Julia v0.6.2, profiling shows that the second version calls a generic function, whereas the first is properly typed. The distressing thing is that code_warntype and code_llvm suggest they should be identical.

On a recent nightly for Julia v0.7, I see no difference between the two.

antoine-levitt · March 31, 2018, 2:58pm

I get a 3x slowdown in 0.6, 2x in nightly. As @RalphSmith says, the code_warntype is identical in both cases, so this is weird.

edit: even the code_native is identical… this is weird.

cortner · March 31, 2018, 8:45pm

Im glad im not alone here . . .

Seif_Shebl · March 31, 2018, 8:46pm

I’m not sure, but this may be related to performance of captured variables in closures?

jlapeyre · March 31, 2018, 9:13pm

The efficiency is improved if you don’t run the example at the top level

function runNA(x, W, beta, N)
    for n = 1:N
        Boltzmann_likelihood(x, W, beta)
    end
end

function runNA1()
    runNA(X0, V, beta, 10_000_000)
end

runNA1() runs as fast as your first example.

kristoffer.carlsson · March 31, 2018, 9:14pm

Function arguments that are not called in the body of the function they are passed to are sometimes not specialized on. This is a heuristic to reduce overspecialization (many functions in julia just forward their arguments to another one, and there is no need to specialize). It is possible to force specalization using the where syntax, e.g. declaring it as:

function runN(x, V::Vf, beta, N) where {Vf}

Should have the same performance after that.

cortner · April 1, 2018, 2:01am

how should one catch this? to most of us I think this is entirely unexpected?

kristoffer.carlsson · April 1, 2018, 9:45am

I agree it is unexpected, which is why I commented https://github.com/JuliaLang/julia/issues/16185#issuecomment-216641059 and later opened https://github.com/JuliaLang/julia/issues/18207.

If I didn’t remember those, I wouldn’t have known the problem here either.

antoine-levitt · April 1, 2018, 10:50am

I still don’t understand how the performance can be different if the generated code is the same. Worth opening an issue?

kristoffer.carlsson · April 1, 2018, 2:32pm

I think the code shown is the one if it was specialised on the function argument.

gideonsimpson · April 1, 2018, 2:42pm

What’s unclear to me is, other than seeing surprisingly large amounts of memory usage in @time, is there any way to predict when this will be an issue or how to identify which piece of the code is the troublemaker?

cortner · April 1, 2018, 3:07pm

I thought @code_warntype should have issued the warning, but I guess this is a different problem. Maybe @mikeinnes’ Traceur.jl could catch it?

cstjean · April 1, 2018, 4:41pm

@code_warntype doesn’t always show you the code that is actually executed.

https://github.com/JuliaLang/julia/issues/23749

Yuyichao makes good arguments for the current state of affairs, but it still feels pretty unsatisfying to me.

Ralph_Smith · April 1, 2018, 5:44pm

Permit me to expand a bit on @kristoffer.carlsson’s point (since I for one didn’t get it at first), and see how code_warntype might be used in cases like this.

I’ll add return values to verify type inference:

module M
bll(x,V,β)=exp(-β*V(x))

function runN(x,V,β,N)
    a=one(eltype(x))
    for n=1:N
        a = exp(-β*V(x))
    end
    a
end

function runN2(x,V,β,N)
    a=one(eltype(x))
    for n=1:N
        a = bll(x,V,β)
    end
    a
end

V(x) = (dot(x,x)-1)^2

function trialx(N=10^6)
    x0=rand(2)
    β=1.0
    a=runN(x0,V,β,N)
    b=runN2(x0,V,β,N)
    a,b
end

end # module M

Simply running code_warntype on runN and runN2 with arguments as specified suggests they would be indistinguishable, as noted above. But let’s look where they are used:

julia> @code_warntype M.trialx(10^6)
Variables:
  #self# <optimized out>
  N::Int64
  x0::Array{Float64,1}
  β <optimized out>
  a::Float64
  b::Float64

Body:
  begin 
      $(Expr(:inbounds, false))
      # meta: location random.jl rand 285
      # meta: location random.jl rand 284
      # meta: location random.jl rand 387
      # meta: location random.jl rand 390
      SSAValue(1) = $(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Float64,1}, svec(Any, Int64), Array{Float64,1}, 0, 2, 0))
      # meta: pop location
      # meta: pop location
      # meta: pop location
      # meta: pop location
      $(Expr(:inbounds, :pop))
      x0::Array{Float64,1} = $(Expr(:invoke, MethodInstance for rand!(::MersenneTwister, ::Array{Float64,1}, ::Int64, ::Type{Base.Random.CloseOpen}), :(Base.Random.rand!), :(Base.Random.GLOBAL_RNG), SSAValue(1), :((Base.arraylen)(SSAValue(1))::Int64), :(Base.Random.CloseOpen))) # line 29:
      a::Float64 = $(Expr(:invoke, MethodInstance for runN(::Array{Float64,1}, ::M.#V, ::Float64, ::Int64), :(M.runN), :(x0), :(M.V), 1.0, :(N))) # line 30:
      b::Float64 = $(Expr(:invoke, MethodInstance for runN2(::Array{Float64,1}, ::Function, ::Float64, ::Int64), :(M.runN2), :(x0), :(M.V), 1.0, :(N))) # line 31:
      return (Core.tuple)(a::Float64, b::Float64)::Tuple{Float64,Float64}
  end::Tuple{Float64,Float64}

Overall inference looks good, but we see the generic method, which is typed as follows:

julia> code_warntype(M.runN2,(Vector{Float64},Function,Float64,Int))
Variables:
  #self# <optimized out>
  x::Array{Float64,1}
  V::F
  β::Float64
  N::Int64
  n <optimized out>
  #temp#::Int64
  a::Any

Body:
  begin 
      a::Any = (Base.sitofp)(Float64, 1)::Float64 # line 38:
      SSAValue(2) = (Base.select_value)((Base.sle_int)(1, N::Int64)::Bool, N::Int64, (Base.sub_int)(1, 1)::Int64)::Int64
      #temp#::Int64 = 1
      5: 
      unless (Base.not_int)((#temp#::Int64 === (Base.add_int)(SSAValue(2), 1)::Int64)::Bool)::Bool goto 14
      SSAValue(3) = #temp#::Int64
      SSAValue(4) = (Base.add_int)(#temp#::Int64, 1)::Int64
      #temp#::Int64 = SSAValue(4) # line 39:
      a::Any = (M.exp)(((Base.neg_float)(β::Float64)::Float64 * (V::F)(x::Array{Float64,1})::Any)::Any)::Any
      12: 
      goto 5
      14:  # line 41:
      return a::Any
  end::Any

This seems to be the source of the inefficient allocations.

As Kristoffer mentions, adding the where clause forces a well-typed specification instead.

The above was from Julia v0.6.2. On v0.7-nightly, runN2 is inlined by default, which avoids the issue. (Decorating with @noinline reverts to a generic method, but a more efficient one than the v0.6 version.)

gideonsimpson · April 1, 2018, 6:39pm

So is the current best recommendation to either use where or switch to the nightly with the promise that this will be improved in future updates?

cortner · April 1, 2018, 7:52pm

Somebody correct me but I think use the where syntax for arguments that are functions.

Topic		Replies	Views
Allocations and time of running the same program twice are orders of magnitude larger than running them separately New to Julia performance , memory-allocation , benchmark	11	662	June 17, 2024
What is the best way to re-use a temporary vector Performance performance , memory-allocation	16	339	October 22, 2024
Performance issue due to function as an argument General Usage question , performance	16	851	September 22, 2023
Reducing allocations to optimize (to match Java speed) Performance	22	1341	December 10, 2018
Memory allocations in for loop variable Performance question	28	3640	August 31, 2018

Boltzmann Factor in a loop

Related topics