# Boltzmann Factor in a loop

#1

I was looking at the performance of a Monte Carlo method with @cortner and he observed there was some unanticipated behavior in the computation of the Boltzmann factor, e^{-\beta V(x)}, within the loop. The following is a simple example that shows the discrepancy:

Boltzmann_likelihood(x, V, beta) = exp(- beta * V(x))

function runN(x, V, beta, N)

for n = 1:N
# Boltzmann_likelihood(x, V, beta)
exp(- beta * V(x))
end
end

V(x) = (dot(x,x) - 1)^2
X0 = rand(2)
beta = 1.0

@time runN(X0, V, beta, 10_000_000)


If we run this code, without calling the predefined Boltzmann_likelihood function, the timing result (on my laptop) is, on a first run,:

  0.296474 seconds (17.07 k allocations: 977.591 KiB)


and

  0.292992 seconds (1.54 k allocations: 82.642 KiB)


on subsequent runs.

If I now rely on the Boltzmann_likelihood function (by commenting/uncommenting), I get on the first run

  1.306022 seconds (40.00 M allocations: 610.438 MiB, 17.79% gc time)


and

  1.175225 seconds (40.00 M allocations: 610.437 MiB, 1.31% gc time)


on subsequent runs.

What explains the discrepancy in memory usage? Can it be improved?

#2

I tried exactly the code posted and do not get such a difference:

julia> @time runN(X0, V, beta, 10_000_000)
0.388611 seconds (16.97 k allocations: 959.374 KiB)

julia> @time runN(X0, V, beta, 10_000_000)
0.355196 seconds (4 allocations: 160 bytes)


and after uncommenting

julia> @time runN(X0, V, beta, 10_000_000)
0.713613 seconds (1.74 k allocations: 94.246 KiB)

julia> @time runN(X0, V, beta, 10_000_000)
0.707212 seconds (4 allocations: 160 bytes)


Twice as long, as you would expect. What julia version and platform are you on?

#3

Oh I see that you commented the other line, and I can reproduce the issue. This is probably because in the first version, the exp(- beta * V(x)) computation is done at compile time, and not in the other case.

#4

Yes, I comparing the case:

# Boltzmann_likelihood(x, V, beta)
exp(- beta * V(x))


to

Boltzmann_likelihood(x, V, beta)
# exp(- beta * V(x))


In the code. I am working with Julia 0.6.2.

#5

On Julia v0.6.2, profiling shows that the second version calls a generic function, whereas the first is properly typed. The distressing thing is that code_warntype and code_llvm suggest they should be identical.

On a recent nightly for Julia v0.7, I see no difference between the two.

#6

I get a 3x slowdown in 0.6, 2x in nightly. As @RalphSmith says, the code_warntype is identical in both cases, so this is weird.

edit: even the code_native is identical… this is weird.

#7

Im glad im not alone here . . .

#8

I’m not sure, but this may be related to performance of captured variables in closures?

#9

The efficiency is improved if you don’t run the example at the top level

function runNA(x, W, beta, N)
for n = 1:N
Boltzmann_likelihood(x, W, beta)
end
end

function runNA1()
runNA(X0, V, beta, 10_000_000)
end


runNA1() runs as fast as your first example.

#10

Function arguments that are not called in the body of the function they are passed to are sometimes not specialized on. This is a heuristic to reduce overspecialization (many functions in julia just forward their arguments to another one, and there is no need to specialize). It is possible to force specalization using the where syntax, e.g. declaring it as:

function runN(x, V::Vf, beta, N) where {Vf}


Should have the same performance after that.

#11

how should one catch this? to most of us I think this is entirely unexpected?

#12

I agree it is unexpected, which is why I commented https://github.com/JuliaLang/julia/issues/16185#issuecomment-216641059 and later opened https://github.com/JuliaLang/julia/issues/18207.

If I didn’t remember those, I wouldn’t have known the problem here either.

#13

I still don’t understand how the performance can be different if the generated code is the same. Worth opening an issue?

#14

I think the code shown is the one if it was specialised on the function argument.

#15

What’s unclear to me is, other than seeing surprisingly large amounts of memory usage in @time, is there any way to predict when this will be an issue or how to identify which piece of the code is the troublemaker?

#16

I thought @code_warntype should have issued the warning, but I guess this is a different problem. Maybe @mikeinnesTraceur.jl could catch it?

#17

@code_warntype doesn’t always show you the code that is actually executed.

Yuyichao makes good arguments for the current state of affairs, but it still feels pretty unsatisfying to me.

#18

Permit me to expand a bit on @kristoffer.carlsson’s point (since I for one didn’t get it at first), and see how code_warntype might be used in cases like this.

I’ll add return values to verify type inference:

module M
bll(x,V,β)=exp(-β*V(x))

function runN(x,V,β,N)
a=one(eltype(x))
for n=1:N
a = exp(-β*V(x))
end
a
end

function runN2(x,V,β,N)
a=one(eltype(x))
for n=1:N
a = bll(x,V,β)
end
a
end

V(x) = (dot(x,x)-1)^2

function trialx(N=10^6)
x0=rand(2)
β=1.0
a=runN(x0,V,β,N)
b=runN2(x0,V,β,N)
a,b
end

end # module M


Simply running code_warntype on runN and runN2 with arguments as specified suggests they would be indistinguishable, as noted above. But let’s look where they are used:

julia> @code_warntype M.trialx(10^6)
Variables:
#self# <optimized out>
N::Int64
x0::Array{Float64,1}
β <optimized out>
a::Float64
b::Float64

Body:
begin
$(Expr(:inbounds, false)) # meta: location random.jl rand 285 # meta: location random.jl rand 284 # meta: location random.jl rand 387 # meta: location random.jl rand 390 SSAValue(1) =$(Expr(:foreigncall, :(:jl_alloc_array_1d), Array{Float64,1}, svec(Any, Int64), Array{Float64,1}, 0, 2, 0))
# meta: pop location
# meta: pop location
# meta: pop location
# meta: pop location
$(Expr(:inbounds, :pop)) x0::Array{Float64,1} =$(Expr(:invoke, MethodInstance for rand!(::MersenneTwister, ::Array{Float64,1}, ::Int64, ::Type{Base.Random.CloseOpen}), :(Base.Random.rand!), :(Base.Random.GLOBAL_RNG), SSAValue(1), :((Base.arraylen)(SSAValue(1))::Int64), :(Base.Random.CloseOpen))) # line 29:
a::Float64 = $(Expr(:invoke, MethodInstance for runN(::Array{Float64,1}, ::M.#V, ::Float64, ::Int64), :(M.runN), :(x0), :(M.V), 1.0, :(N))) # line 30: b::Float64 =$(Expr(:invoke, MethodInstance for runN2(::Array{Float64,1}, ::Function, ::Float64, ::Int64), :(M.runN2), :(x0), :(M.V), 1.0, :(N))) # line 31:
return (Core.tuple)(a::Float64, b::Float64)::Tuple{Float64,Float64}
end::Tuple{Float64,Float64}



Overall inference looks good, but we see the generic method, which is typed as follows:

julia> code_warntype(M.runN2,(Vector{Float64},Function,Float64,Int))
Variables:
#self# <optimized out>
x::Array{Float64,1}
V::F
β::Float64
N::Int64
n <optimized out>
#temp#::Int64
a::Any

Body:
begin
a::Any = (Base.sitofp)(Float64, 1)::Float64 # line 38:
SSAValue(2) = (Base.select_value)((Base.sle_int)(1, N::Int64)::Bool, N::Int64, (Base.sub_int)(1, 1)::Int64)::Int64
#temp#::Int64 = 1
5:
unless (Base.not_int)((#temp#::Int64 === (Base.add_int)(SSAValue(2), 1)::Int64)::Bool)::Bool goto 14
SSAValue(3) = #temp#::Int64
SSAValue(4) = (Base.add_int)(#temp#::Int64, 1)::Int64
#temp#::Int64 = SSAValue(4) # line 39:
a::Any = (M.exp)(((Base.neg_float)(β::Float64)::Float64 * (V::F)(x::Array{Float64,1})::Any)::Any)::Any
12:
goto 5
14:  # line 41:
return a::Any
end::Any


This seems to be the source of the inefficient allocations.

As Kristoffer mentions, adding the where clause forces a well-typed specification instead.

The above was from Julia v0.6.2. On v0.7-nightly, runN2 is inlined by default, which avoids the issue. (Decorating with @noinline reverts to a generic method, but a more efficient one than the v0.6 version.)

#19

So is the current best recommendation to either use where or switch to the nightly with the promise that this will be improved in future updates?

#20

Somebody correct me but I think use the where syntax for arguments that are functions.