# Too many allocations?

I’m thinking about optimizing performance and my mind initially goes to reducing the number of allocations. Below is a function with a loop that executes ~ 1,000,000 times.
When I “time” the routine, I find that it executes ~9 million reallocations. I just have no idea whether this is a lot and I should try to reduce it or whether this is about right.

``````function RK2(pend::pendulum;poincare::Bool= false)
r = [pend.thetaInitial,pend.omegaInitial]
t = 0.0
N = round(Integer,pend.tMax/pend.dt)
M = round(Integer,pend.omegaD * pend.tMax/(2π))
theta = zeros(Float64,N+1,1)
omega = zeros(Float64,N+1,1)
saveTheta = zeros(Float64,M,1)
saveOmega = zeros(Float64,M,1)
omega = zeros(Float64,N+1,1)

time = zeros(Float64,N+1,1)
theta[1] = pend.thetaInitial
omega[1] = pend.omegaInitial
time[1] = 0.0
counter = 2
index = 1
for i=1:N
k1 =  pend.dt .* getDerivs(pend,r,t)
# test = getDerivs(pend,r,t + 1/2 * pend.dt)
k2 = pend.dt .* getDerivs(pend,r .+ 0.5 .* k1,t + 1/2 * pend.dt)

r += k2
if r[1] > π && pend.resets
r[1] -= 2π
elseif r[1] < -π && pend.resets
r[1] += 2π
end

if inPhase(pend,t)
saveTheta[index] = r[1]
saveOmega[index] = r[2]
index += 1
end
t+= pend.dt
theta[counter] = r[1]
omega[counter] = r[2]
time[counter] = t
counter += 1
end
if poincare
saveTheta,saveOmega
else
time,theta,omega
end
end
``````

You need to give runnable code, otherwise it is impossible to say.

1 Like
``````mutable struct pendulum
thetaInitial::Float64
omegaInitial::Float64
dt::Float64
tMax::Float64
l::Float64
q::Float64
Fd::Float64
resets::Bool
offset::Float64
g::Float64

function pendulum(thetaInitial::Float64, omegaInitial::Float64, dt::Float64,tMax::Float64,l::Float64,q::Float64,Fd::Float64,omegaD::Float64;resets::Bool=true,offset::Float64 = 0.0,g::Float64=9.8)
#Only here to handle the defaults

end
end

function getDerivs(pend::pendulum, vec::Array{Float64,1},t::Float64)

dOmega = -pend.g/pend.l * sin(vec[1]) - pend.q * vec[2] + pend.Fd * sin(pend.omegaD * t)

return [vec[2],dOmega]

end

function inPhase(pend::pendulum,t::Float64)
if abs(pend.omegaD * (t - pend.offset)/( 2π ) - round(Integer,pend.omegaD * (t - pend.offset)/(2π) ) ) < pend.dt * pend.omegaD/(4π)
return true
else
return false
end

end
function RK4(pend::pendulum;poincare::Bool= false)
r = [pend.thetaInitial,pend.omegaInitial]
t = 0.0
N = round(Integer,pend.tMax/pend.dt)
M = round(Integer,pend.omegaD * pend.tMax/(2π))
theta = zeros(Float64,N+1,1)
omega = zeros(Float64,N+1,1)
saveTheta = zeros(Float64,M,1)
saveOmega = zeros(Float64,M,1)
omega = zeros(Float64,N+1,1)

time = zeros(Float64,N+1,1)
theta[1] = pend.thetaInitial
omega[1] = pend.omegaInitial
time[1] = 0.0
counter = 2
index = 1
for i=1:N
k1 =  pend.dt .* getDerivs(pend,r,t)
# test = getDerivs(pend,r,t + 1/2 * pend.dt)
k2 = pend.dt .* getDerivs(pend,r .+ 0.5 .* k1,t + 1/2 * pend.dt)

r += k2
if r[1] > π && pend.resets
r[1] -= 2π
elseif r[1] < -π && pend.resets
r[1] += 2π
end

if inPhase(pend,t)
saveTheta[index] = r[1]
saveOmega[index] = r[2]
index += 1
end
t+= pend.dt
theta[counter] = r[1]
omega[counter] = r[2]
time[counter] = t
counter += 1
end
if poincare
saveTheta,saveOmega
else
time,theta,omega
end
end

using Profile
using Plots
plotly()

thetaInitial = 0.2
omegaInitial = 0.0
dt = 0.001
tMax = 16000.0
l = 9.8
q = 0.5
Fd = 1.2
resets = false
offset = 0.0
@time saveTheta,saveOmega = RK4(myPendulum,poincare=true)

scatter(saveTheta,saveOmega,ms=1)
#timeTwo,thetaTwo = RK4(mySecondPendulum)
#plot!(timeOne,thetaTwo)

#plot!(timeOne,thetaTwo,yaxis=:log,ylim = (1e-6,10))
#Profile.print()
``````

This is my first attempt at a Julia code. If you see anything that is bad practice or have any suggestions, I welcome them.

Thanks

This stands out to me as an easy thing to improve. You’re returning two values (which is fine), but you’re returning them by constructing an `Array` to hold those two elements. Constructing a new `Array` is rather expensive (if you do it a million times, at least), and that’s one of your sources of frequent allocations.

Instead, you can return a `Tuple`, which is basically free to construct. That would look like:

``````return vec[2], dOmega
``````

or equivalently, `(vec[2], dOmega)` because the parentheses are optional here

5 Likes

Thank you. Great Idea. Can I use a tuple to do vector math upon returning from the function or will I have to do it component by component.

I should mention that this code is already much faster than it’s python counterpart. I’m just looking to build intuition for optimizing Julia code. Thanks in advance.

You can probably preallocate and re-use `k1` and `k2`. `.=` assigns in place, `=` make a new array which would allocate. Also `omegaD = Float64(2/3)` the `Float64` is uneccesary as `2/3` already returns a `Float64`.

3 Likes

Good Idea. I had that thought but I neglected the `.=` so I didn’t get an improvement.

You can broadcast over a tuple (e.g. you can do `x .+ 1` if `x` is a tuple), but if you want to do anything more vector-like, then check out GitHub - JuliaArrays/StaticArrays.jl: Statically sized arrays for Julia . An SVector from StaticArrays is as cheap as a tuple (it actually is a tuple under the hood) but acts like a proper vector.

3 Likes

You guys are all awesome. I preallocated `k1` and `k2` and returned a tuple instead of an array. This reduced the number of allocations from 9 M to about 1 M and sped the code up by a factor of ~5. The code was already crazy fast and now it’s ridiculous fast. Thanks a ton for your quick and helpful replies.

4 Likes

You could use Parameters.jl to initialize the structure instead of the inner constructor (which, in that case, can be also an outer constructor, that is more common). And the name of the struct is recommended to be in upper case (`struct Pendulum`):

``````julia> using Parameters

julia> @with_kw mutable struct A
x :: Float64
b :: Bool = false
end
A

julia> a = A(x=1.0)
A
x: Float64 1.0
b: Bool false

``````

From your code I could not see why your `Pendulum` has to be mutable. Immutable structus will be faster, in general.

Also: `Array{Float64,1}` is the same as `Vector{Float64}` (but you probably want just `AbstractVector` in the function signature. That will allow your function to receive `views` and vectors of other types of numbers (like `Float32`).

This line probably allocates:

should better be `r .+= k2` (with the dot).

Also, are you sure you want `theta = zeros(Float64,N+1,1)` instead of `zeros(N+1)`? which is a vector, not a two-dimensional array in which the second dimension has only one element?

2 Likes

Great suggestions. Thank you!. I’m not familiar with Parameters.jl so I’ll read up on it. Inner constructors are not typical? Is it just a question of style or are there efficiency reasons?

Generally they are used when you need to impose constraints for the parameters, and/or if the constructor is self-referential (in which case an outer constructor would loop until segfault). This is very well written here:

https://docs.julialang.org/en/v1/manual/constructors/

2 Likes

Final question on this thread:

What mental routine do you go through for determining how many allocations is appropriate and expected. (i.e. When to stop optimizing and just let it go.) In this case, I’m executing a loop 1 M times and I’m getting ~ 1 M reallocations. Seems reasonable???

If you do these changes, you reduce even further the allocations:

``````    rtmp = zeros(2)  # preallocate a temporary vector here
for i=1:N
k1 = pend.dt .* getDerivs(pend,r,t)
rtmp .= r .+ 0.5 .* k1 # compute it here
k2 = pend.dt .* getDerivs(pend,rtmp,t + 1/2 * pend.dt)
r .+= k2 # add the dot

``````
``````julia> @time saveTheta,saveOmega = RK4(myPendulum,poincare=true)
0.977168 seconds (19 allocations: 488.308 MiB, 4.39% gc time)

``````

(now the loop does not allocate anything).

Some tips here on how to find the allocations: Disabling allocations

My impression here is: loops that perform critical tasks should only allocate if one is clearly aware of the reason of the allocation. Otherwise it is a good idea to search where the allocation is and try to solve it. Of course, if the execution time is still a problem at all.

3 Likes

It can be hard to have such a mental model for allocations if a function is complicated and does many things.

My strategy is to split code into functions (where it makes sense) and profile the program to see where time is spent. Then, I can focus on one function at a time and manually look for what operations might be allocating or generally inefficient. Most of the time, those allocations can be minimized or eliminated, e.g. by passing in preallocated arrays and using in-place operations.

1 Like

This seems like a classic case where you should be using StaticArrays: a small array whose size (2) is a compile-time constant. This would also eliminate heap allocations, and would probably be faster.

In general, my feeling is that if your innermost computational loop has any allocations proportional to the number of loop iterations, you are doing something “wrong”.

8 Likes

The above probably doesn’t hurt performance much, but the pattern is quite strange:

``````function compare(a, b)
if a > b
return true
else
return false
end
end
``````

can be replaced with

``````function compare(a, b)
return a > b
end
``````

because `a > b` already is a boolean value. You’re basically writing “if x is true, return true, else return false”, instead of just returning “x” in the first place.

Another style thing: `round(Integer, x)` should probably be `round(Int, x)`. It doesn’t seem to hurt, it’s just something I’ve never seen before, as `Integer` is an abstract type.

3 Likes

Shouldn’t this return an error (as confusing as it could be for a new user)?

Why? I think it makes sense as it is:

``````julia> f(x) = typeof(round(Integer, x))
f (generic function with 1 method)

julia> f(1.1)
Int64

julia> f(big(1.1))
BigInt

``````
2 Likes