Too many allocations?

This stands out to me as an easy thing to improve. You’re returning two values (which is fine), but you’re returning them by constructing an Array to hold those two elements. Constructing a new Array is rather expensive (if you do it a million times, at least), and that’s one of your sources of frequent allocations.

Instead, you can return a Tuple, which is basically free to construct. That would look like:

return vec[2], dOmega

or equivalently, (vec[2], dOmega) because the parentheses are optional here

5 Likes

Thank you. Great Idea. Can I use a tuple to do vector math upon returning from the function or will I have to do it component by component.

I should mention that this code is already much faster than it’s python counterpart. I’m just looking to build intuition for optimizing Julia code. Thanks in advance.

You can probably preallocate and re-use k1 and k2. .= assigns in place, = make a new array which would allocate. Also omegaD = Float64(2/3) the Float64 is uneccesary as 2/3 already returns a Float64.

3 Likes

Good Idea. I had that thought but I neglected the .= so I didn’t get an improvement.

You can broadcast over a tuple (e.g. you can do x .+ 1 if x is a tuple), but if you want to do anything more vector-like, then check out GitHub - JuliaArrays/StaticArrays.jl: Statically sized arrays for Julia . An SVector from StaticArrays is as cheap as a tuple (it actually is a tuple under the hood) but acts like a proper vector.

3 Likes

You guys are all awesome. I preallocated k1 and k2 and returned a tuple instead of an array. This reduced the number of allocations from 9 M to about 1 M and sped the code up by a factor of ~5. The code was already crazy fast and now it’s ridiculous fast. Thanks a ton for your quick and helpful replies.

4 Likes

You could use Parameters.jl to initialize the structure instead of the inner constructor (which, in that case, can be also an outer constructor, that is more common). And the name of the struct is recommended to be in upper case (struct Pendulum):

julia> using Parameters

julia> @with_kw mutable struct A
          x :: Float64
          b :: Bool = false
       end
A

julia> a = A(x=1.0)
A
  x: Float64 1.0
  b: Bool false



From your code I could not see why your Pendulum has to be mutable. Immutable structus will be faster, in general.

Also: Array{Float64,1} is the same as Vector{Float64} (but you probably want just AbstractVector in the function signature. That will allow your function to receive views and vectors of other types of numbers (like Float32).

This line probably allocates:

should better be r .+= k2 (with the dot).

Also, are you sure you want theta = zeros(Float64,N+1,1) instead of zeros(N+1)? which is a vector, not a two-dimensional array in which the second dimension has only one element?

2 Likes

Great suggestions. Thank you!. I’m not familiar with Parameters.jl so I’ll read up on it. Inner constructors are not typical? Is it just a question of style or are there efficiency reasons?

Generally they are used when you need to impose constraints for the parameters, and/or if the constructor is self-referential (in which case an outer constructor would loop until segfault). This is very well written here:

https://docs.julialang.org/en/v1/manual/constructors/

2 Likes

Final question on this thread:

What mental routine do you go through for determining how many allocations is appropriate and expected. (i.e. When to stop optimizing and just let it go.) In this case, I’m executing a loop 1 M times and I’m getting ~ 1 M reallocations. Seems reasonable???

If you do these changes, you reduce even further the allocations:

    rtmp = zeros(2)  # preallocate a temporary vector here
    for i=1:N
        k1 = pend.dt .* getDerivs(pend,r,t)
        rtmp .= r .+ 0.5 .* k1 # compute it here
        k2 = pend.dt .* getDerivs(pend,rtmp,t + 1/2 * pend.dt)
        r .+= k2 # add the dot

julia> @time saveTheta,saveOmega = RK4(myPendulum,poincare=true)
  0.977168 seconds (19 allocations: 488.308 MiB, 4.39% gc time)

(now the loop does not allocate anything).

Some tips here on how to find the allocations: Disabling allocations

My impression here is: loops that perform critical tasks should only allocate if one is clearly aware of the reason of the allocation. Otherwise it is a good idea to search where the allocation is and try to solve it. Of course, if the execution time is still a problem at all.

3 Likes

It can be hard to have such a mental model for allocations if a function is complicated and does many things.

My strategy is to split code into functions (where it makes sense) and profile the program to see where time is spent. Then, I can focus on one function at a time and manually look for what operations might be allocating or generally inefficient. Most of the time, those allocations can be minimized or eliminated, e.g. by passing in preallocated arrays and using in-place operations.

1 Like

This seems like a classic case where you should be using StaticArrays: a small array whose size (2) is a compile-time constant. This would also eliminate heap allocations, and would probably be faster.

In general, my feeling is that if your innermost computational loop has any allocations proportional to the number of loop iterations, you are doing something “wrong”.

8 Likes

The above probably doesn’t hurt performance much, but the pattern is quite strange:

function compare(a, b)
    if a > b
        return true
    else
        return false
    end
end

can be replaced with

function compare(a, b)
    return a > b
end

because a > b already is a boolean value. You’re basically writing “if x is true, return true, else return false”, instead of just returning “x” in the first place.

Another style thing: round(Integer, x) should probably be round(Int, x). It doesn’t seem to hurt, it’s just something I’ve never seen before, as Integer is an abstract type.

3 Likes

Shouldn’t this return an error (as confusing as it could be for a new user)?

Why? I think it makes sense as it is:

julia> f(x) = typeof(round(Integer, x))
f (generic function with 1 method)

julia> f(1.1)
Int64

julia> f(big(1.1))
BigInt

2 Likes

Nice. I didn’t think about that interpretation. It was just the rounding to an abstract type seems strange. Which is the function that returns other type of number of the same “size”, given one type? Meaning

f(::Float32) = Int32
f(::Float64) = Int64
f(::Int32) = Float32
etc.
1 Like

In this case this “mapping” seems implemented for trunc, floor, ceil, round as @edit tells me. By the way, round(Integer, Float32(1.1)) gives Int64, not Int32 (same for Float16). I don’t really know if this is intended, but as I just asked for an integer type, it seems reasonable that julia gives me the machine integer type if it can fit x.

1 Like

It’s a bit tricky because of inexact mapping from floats to integers. For example, Float32 can represent 2^31 perfectly, whereas Int32 cannot:

julia> Integer(Float32(2^31))
2147483648

julia> round(Int32,Float32(2^31))
ERROR: InexactError: trunc(Int32, 2.1474836e9)

It should also be possible to create similar errors when rounding certain Float64 to Int64. It’s unclear what the intention should be with such edge cases, and it’s probably unreasonable to play it safe by returning BigInt for round(Integer, Float32(x)). Some might say it’s not great to default to round(Int, Float32(x)) because the result can then differ between 32- and 64-bit CPUs. I guess Python will use bigger integers when needed, but with a big trade-off in performance.

Only the coder knows their use case, and they are probably better off being concrete, round(Int32, Float32(x)) when they know it will work.

3 Likes

A somewhat more basic suggestion:

here you are pre-allocating omega twice it seems?