Code runtime constantly increasing with time

Hello Everyone,

I am fairly new to Julia, so sorry if the problem I would like to ask about may seem a bit general.

I have actually written a plasma simulation code which is unfortunately too much detailed and consisting of too many lines to be put here in its entirety. But just to give some context, it’s basically a code to push around some large number of “particles” (typically around 5e5 to 1e6) and to perform some checks and actions on them. Fast Forward to the problem I observe, the code starts off quite fast but as the number of iterations increases, it starts to getting slower and slower. I am including a figure below to show the code runtime variation over time. This figure is obtained by running the code for 20,000 iterations and for a particle count of 30,000. For the timing, I have used @time for the entire loop (so in Global Scope which should still give an indication of runtime).

Runtime vs. iterations

As just a quick hint, the increase in runtime is somewhat expected as the number of particles increases through iterations and some functions are being called more times. However, the expected time rise is about 2 percent whereas here the increase is about a factor of 3.5.

Now to explain a little bit the code structure, to store each particle information, I have defined a custom type called Particle as a mutable structure. It contains 4 arrays of type MVector{3, Float64} and one parameter of type Float64. All particles defined as above are then stored within another array which is itself included in another immutable structure as a data wrapper. The code snippet for these structures is below:

mutable struct Particle
    pos::MVector{3,Float64}
    vel::MVector{3,Float64}
    cellIndex::MVector{3,Int64}
    posInCell::MVector{3,Float64}
    mpwt::Float64          
end
struct Species
    name::String
    particles::Array{Particle,1}
    charge::Float64
    mass::Float64
    van_der_waals_radius::Float64
end

Within the code main loop, I need to add particles to the array containing them and also delete some of them that exceed simulation domain boundaries. To do this, I have been using push! to add to the particles array and deleteat! to remove data from it as below:
push!(species.particles, part)
deleteat!(species.particles, i) where i is the index of element to be deleted.

After observing performance degradation, I did some profiling and it seemed like the functions in which push! and deleteat! are called could be the reason. I was specifically concerned about shifts in memory whenever data are being added or removed from the particles array. As a result, I tried replacing deleteat! with the following, which effectively tries to replace the item to be deleted with an element from the end of the array and then delete that last element.

species.particles[i] = species.particles[end]
deleteat!(species.particles, length(species.particles))

This approach did not work and even reduced further the performance. Then I thought about push! and instead tried preallocating the particles arrays using resize!, initialized with #undef, and then use setindex! to add elements to the particles array. This approach also did not help and the runtime was still as shown in the figure above.

As a result, I thought to open this new thread here as the problem of general rise in a code runtime might be faced by others as well. I would appreciate very much your thoughts and suggestions on this issue.
Many thanks!

Seems that you more or less tried these things, but you could try preallocating the maximum size array and never delete or push anything, just swap values and redefined the number of elements. That would at least excluded the possibility of memory manipulation be the issue. (Although I have the feeling that those are quite efficiently implemented in Julia, so the problem may be somewhere else)

Yeah, that seems like a reasonable guess about the problem. Calling deleteat! on an index partway through the vector will require every element after that index to be moved, which is potentially expensive.

That also seems like a good idea. I tried a simple benchmark comparing your suggestion against deleteat!, and it is indeed much faster:

julia> function f1(x)
         i = div(length(x), 2)  # pick a point halfway through the array
         deleteat!(x, i)
       end
f1 (generic function with 1 method)

julia> function f2(x)
         i = div(length(x), 2)  # pick a point halfway through the array
         x[i] = x[end]  # move the last element to i
         deleteat!(x, length(x))  # delete the last element
       end
f2 (generic function with 1 method)

julia> function f3(x)
         i = div(length(x), 2)
         x[i] = x[end]  # move the last element to i
         resize!(x, length(x) - 1)  # shrink the vector
       end
f3 (generic function with 1 method)

julia> @btime f1(x) setup=(x = rand(10000)) evals=1;
  552.000 ns (0 allocations: 0 bytes)

julia> @btime f2(x) setup=(x = rand(10000)) evals=1;
  35.000 ns (0 allocations: 0 bytes)

julia> @btime f3(x) setup=(x = rand(10000)) evals=1;
  36.000 ns (0 allocations: 0 bytes)

So the idea is sound, and it works well in this benchmark. I can’t say more about why this didn’t work in your case without more information about your code.

As a side note, if your collection of particles is more like an unordered set and less like a vector, then you might consider using a Set instead, since it should generally be faster to delete arbitrary elements from a Set:

julia> @btime delete!(x, 5000) setup=(x = Set(1:10000)) evals=1
  26.000 ns (0 allocations: 0 bytes)
3 Likes

Try

using ProfileView
@profview yourfunction()

and see what parts of the code take longest, and if any of them are allocating (red).

2 Likes

Thank you all for your replies and the indeed useful suggestions. I’ve been trying your proposed solutions among other ideas of myself today and so I will mention below each of the results I got in relation to your suggestions.

@lmiq
I actually have had implemented this. I preallocated a 50,000 array using resize! for each of the three particles array, each including 30,000 particles, and did not delete or push-added elements. Unfortunately, this did not resolve the problem. Also, from the profiling graph, the percentage of time spent on the push! and deleteat! functions seemed to be actually much less than other functions.

@rdeits The deleteat! was actually taking about 0.05% of the whole simulation time. After implementing replace and resize as per your suggestion, the deleteat! does not even appear in the profiling Flame Graph. So the change was indeed working and the memory allocation graph shows an improvement, but unfortunately, the timerise issue is still there and did not get resolved.

Interestingly, and quite strangely to be honest, the getindex function appears to be taking a total of about 46% of the whole simulation time for a 20,000-iteration. For 10,000 iterations, it’s taking about 38% of the total time.
It also seemed that many of these getindex functions are invoking the unsafe_load.

Below, and for the kind attention of everyone, I am including a snapshot of one of the most recent Flame Graph views of the profiling I did on the main simulation loop (using PProf and Graphviz).

As you can hopefully notice, under the getindex block in functions ComputeGasProperties up until the end of the graph, there are smaller blocks of unsafe_load. I have managed to remove all of these by changing argument pass to functions from the first version in the code snippet below to the second version:

#First version of functions with getindex function calls 
function scatter(I::Int64, rho::Array{Float64,1}, y::Float64, part)
     i = part.cellIndex[I]
    di = part.posInCell[I]

    rho[i] += y * (1 - di)
    rho[i + 1] += y * (di) 

    return nothing
end

function gather(I, ef::Array{Float64,1}, part::Particle)
    
    i = part.cellIndex[I]
    di = part.posInCell[I]
               
     # Gathers a node-based vector field "y" onto a particle position
     out_var = ef[i] * (1 - di)  + ef[i + 1] * (di) 

    return out_var
end

#Second version to remove unsafe_load 
function scatter(I::Int64, rho::Array{Float64,1}, y::Float64, cellIndex, posInCell)
     i = cellIndex
    di = posInCell

    rho[i] += y * (1 - di)     
    rho[i + 1] += y * (di) 

    return nothing
end

function gather(I, ef::Array{Float64,1}, cellIndex, posInCell)
    
    i = cellIndex
    di = posInCell
               
    # Gathers a node-based vector field "y" onto a particle position
    out_var = ef[i] * (1 - di)  + ef[i + 1] * (di) 

    return out_var
end

The definition of Particle structure is as shown in the original post. It has two fields of cellIndex and posInCell, which have been passed through part (a reference to particle) in the first version (triggering unsafe_load perhaps??), whereas I will pass them in the second version as below:

scatter(I, rho, part.mpwt * sp.charge, part.cellIndex[1], part.posInCell[1])    # for scatter
ef_part = gather(1, domain.efX, part.cellIndex[1], part.posInCell[1])            # for gather

The problem is that even though by the above changes, the overall simulation time decreases (so the runtime graph is shifted down as a whole), the slope of time rise is still more or less the same. I am now wondering if the getindex should actually be taking so much time (recalling that there are a total of 90,000 particles, divided in three arrays of 30,000 elements, and the increase in their number is only about 1.02 to 1.05 percent) or I might be doing something wrong.
I would very much appreciate your help as to how I could be able to reduce the time of getindex.

Regarding your side note, I unfortunately cannot use a Set instead of an array since even though the order for particles is not important, I still need to be able to have access to their index to remove them for example.

Probably this diverges a little from the actual cause of your problem. But you probably can get better performance if you use an immutable struct for Particle. You can still work with it as if it was mutable using, for instance Setfield:


using StaticArrays
using Setfield

struct Particle{N,T}
    pos::SVector{N,T}
    vel::SVector{N,T}
    cellIndex::SVector{N,T}
    posInCell::SVector{N,T}
    mpwt::T
end
Particle(pos::SVector{N,T},vel::SVector{N,T}) where {N,T} =
  Particle{N,T}(pos,vel,zeros(SVector{N,T}),zeros(SVector{N,T}),zero(T))
julia> p1 = Particle(rand(SVector{3,Float64}),rand(SVector{3,Float64}))
Particle{3, Float64}([0.6450365095099164, 0.1452058334271602, 0.8468072605170518], [0.4367934874802435, 0.9566117309064159, 0.15912494484991213], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0], 0.0)

julia> @set! p1.cellIndex = SVector{3,Float64}(1,1,1)
Particle{3, Float64}([0.6450365095099164, 0.1452058334271602, 0.8468072605170518], [0.4367934874802435, 0.9566117309064159, 0.15912494484991213], [1.0, 1.0, 1.0], [0.0, 0.0, 0.0], 0.0)

In this case, probably you need to parameterize the Species data struct as well, with the actual type of particle:

struct Species{N,T}
    name::String
    particles::Array{Particle{N,T},1}
    charge::T
    mass::T
    van_der_waals_radius::T
end

1 Like

I don’t know if it is helpful, but maybe a few carefully locate @inbounds can help with the getindex time. It can be tricky cause your are changing the size of the array…

Is there any way any of your indexing operations can be replaced by normal iteration?

Thank you very much @lmiq for this suggestion. I have had tried defining Particle struct as immutable but Julia was giving me the setindex! error for immutable types. I suppose the use of @set! macro here would circumvent the problem.
I would definitely try this and hopefully I can implement it all correctly; I hope this may also help with the original problem as I am still not able to figure out what might be causing the time rise.

@josePereiro Sure, thank you for the suggestion; I have been using @inbounds in a couple of locations but the main issue is that it’s not certain yet if getindex time is actually the culprit behind the constant rise in runtime.
Basically, I have been passing variables by reference to functions and they need to access a certain element, so it should be something implemented effectively in Julia and does not require increased time if the size of arrays increases.

@gustaphe well, I would say not without modifying the entire methodology behind the code; the normal iterations have been implemented as much as possible. But yet as I’ve written in the reply above, I suppose getindex should have been implemented efficiently in Julia, and if I pass a variable of custom type such as Particle by reference to a function, it should not take much long for Julia to access a certain element in one of its fields. However, please do let me know if this may not be valid and there might be some issues for getindex working with custom types.

It’s not that Julia’s implementation of getindex is slow, it’s an inherently slow operation. Some languages don’t have better ways to traverse arrays, but Julia does.

I guess with @inbounds it shouldn’t be any worse than iteration (unless you are parallelizing), but it’s harder to get iteration wrong.

2 Likes

Thank you for your response; you’re right! I have been trying to find correct places to include @inbounds over the last couple of days and I have been seeing some improvement but the overall issue is still there. Perhaps, it’s not the getindex time that is getting longer as the iterations increases, but I am still not fully sure!

I’ve tried implementing your solution, and I do appreciate the code samples you wrote with the nomenclature familiar for me. It made the implementation much easier.

There are just two points however; first is that I unfortunately cannot define Species as AbstractArray (which I think is how the definition is technically being referred to?!) since I do not know in advance the number of particles and they would be changing over iterations.

Second, as for the Particle structure, I tried to modify your sample as below so as to also include cellIndex, posInCell and mpwt within the constructor. However, when I pass the variables of type Particle to a function that needs to update cellIndex and posInCell, these variables do not get updated. I have used @set! macro so the code doesn’t give an error. However, the values remain zero as initialized. The relevant code snippets are below:

struct Particle{N,T}
    pos::SVector{N,T}
    vel::SVector{N,T}
    cellIndex::SVector{N,T}
    posInCell::SVector{N,T}
    mpwt::T
end

Particle(pos::SVector{N,T}, vel::SVector{N,T}, cellIndex::SVector{N,T}, posInCell::SVector{N,T}, mpwt::T)  where {N,T} =
        Particle{N,T}(pos, vel, cellIndex, posInCell, mpwt)

part = Particle(SVector(pos1, pos2, pos3), SVector(Vx, Vy, Vz), SVector(0.0, 0.0, 0.0), SVector(0.0, 0.0, 0.0), mpwt);

pos_to_logical(part, x_min, Δh)


function pos_to_logical(part::Particle, x_min::Array{Float64,1}, Δh::Array{Float64,1}) 

    lc = (part.pos[1] - x_min[1]) / Δh[1]

    @set! part.cellIndex[1] = floor(lc) + 1

    @set! part.posInCell[1] = lc - part.cellIndex[1] + 1
  
end

Of course, when I execute the commands in global scope or in REPL, the values get updated, but when I want to update values through the call to pos_to_logical, the part.cellIndex[1] and part.posInCell[1] does not change.

I also would like to highlight that the pos_to_logical function is inside another file which is included using include in the main simulation file.

I would be really thankful if you could please let me know what I might be doing wrong or am missing.

Did you mean a StaticArray? Arrays in general can have their dimensions changed. I am not sure if I understand what you are saying here.

Concerning the second point: the immutable structs are passed by value, thus that is the expected behaviour (of course non-static vectors contained in a immutable struct are mutable and are passed by reference, and will be mutated). But, answering more explicitly, you should return part from the function and use:

part = pos_to_logical(part, x_min, Δh)

which is the same pattern that you use, for example, if you do x = sin(x), or any other assignment with the same label as that of the variable you are passing.

You might have the impresion that this cannot be better. But look at this (not saying that this difference will occur in your code as well):

julia> using Setfield

julia> struct A
         x::Int
       end

julia> mutable struct B
         x::Int
       end

julia> a = A(0)
A(0)

julia> b = B(0)
B(0)

julia> function upA(a,n)
         for i in 1:n
           @set! a.x = a.x + 1
         end
         return a
       end
upA (generic function with 1 method)

julia> function upB(b,n)
         for i in 1:n
           b.x = b.x + 1
         end
       end
upB (generic function with 1 method)

julia> using BenchmarkTools

julia> @btime upA($a,100_000)
  1.451 ns (0 allocations: 0 bytes)
A(100000)

julia> @btime upB(bb,100_000) setup=(bb=deepcopy(b)) evals=1
  25.094 μs (0 allocations: 0 bytes)

Edit: the reason of that huge difference is that the compiler is able to figure out the solution in the immutable case and simply just not do the calculation. You probably won’t see such a difference in a real case. But it is true that the compiler can do more optimizations in general with immutable variables than with mutable ones. (in this particular case, and in some other toy example, the compiler might well optimize the mutable case as well).

2 Likes

Many thanks for your reply and for the clarification!

Well, I was trying to find the correct terminology for defining an struct as Particle{N,T}. I guess I have used a wrong phrase; sorry for creating a confusion. I also now noticed ‘particles’ are of type array so their size should indeed be changeable.

I now see the point; it’s absolutely correct! I would then try to implement the function such that it returns part.

Another alternative to using Setfield is to use one-element vectors, for example:

julia> struct A
         x::Vector{Int}
       end

julia> function upA(a,n)
         for i in 1:n
           a.x[1] += 1
         end
         return a
       end
upA (generic function with 1 method)

julia> @btime upA(a,100_000) setup=(a=A([0])) evals=1
  24.000 ns (0 allocations: 0 bytes)
A([100000])

That could be used for your mutable scalars inside your immutable struct as well. It is a little bit ugly to have to intialize and mutate scalars as if they are arrays, but I in general feel safer with that. I have even recently discussed all the alternatives on how to do that, here.

1 Like