High Memory Usage when using offline_run from Agents.jl in for loops

Hello Julia Community,

I’m currently working with the Agents.jl package, specifically using the offline_run function. I’ve encountered an issue where my program seems to be consuming a significant amount of memory when calling offline_run within a for loop.

Issue Description:

  • Function Used: offline_run from Agents.jl.
  • Observed Problem: High memory usage when calling function in a for loop.
  • Context: I want to run my model for many different parameter values. The resulting dataframe will not fit in memory, so I want to write the data to file while the model simulations run. I expected memory usage not to exceed what is needed for an individual simulation, as all iterations of the for loop should be independent. However, my memory usage continues to rise and exceeds the size of the full dataframe. Below, I provided a MWE of the issue, which uses the Schelling model defined in the Agents.jl documentation.

Code Snippet:

module MWERunOffline

# We make use of the following packages in this script
using Agents
using Random
using ProgressMeter # Issue has been replicated without ProgressMeter; it's nice to know how long the for loop might take

"A utility function to get the size of a file in GB"
function get_file_size_in_gb(filename)
    size_in_bytes = filesize(filename)
    size_in_gb = size_in_bytes / (1024^3)
    return size_in_gb
end

# Below, I use the Schelling model defined in the Agents.jl documentation
# https://juliadynamics.github.io/Agents.jl/stable/examples/schelling

@agent SchellingAgent GridAgent{2} begin
    mood::Bool # whether the agent is happy in its position. (true = happy)
    group::Int # The group of the agent, determines mood as it interacts with neighbors
end

function agent_step!(agent, model)
    minhappy = model.min_to_be_happy
    count_neighbors_same_group = 0
    # For each neighbor, get group and compare to current agent's group
    # and increment `count_neighbors_same_group` as appropriately.
    # Here `nearby_agents` (with default arguments) will provide an iterator
    # over the nearby agents one grid point away, which are at most 8.
    for neighbor in nearby_agents(agent, model)
        if agent.group == neighbor.group
            count_neighbors_same_group += 1
        end
    end
    # After counting the neighbors, decide whether or not to move the agent.
    # If count_neighbors_same_group is at least the min_to_be_happy, set the
    # mood to true. Otherwise, move the agent to a random position, and set
    # mood to false.
    if count_neighbors_same_group ≥ minhappy
        agent.mood = true
    else
        agent.mood = false
        move_agent_single!(agent, model)
    end
    return
end

function initialize(; total_agents=320, griddims=(20, 20), min_to_be_happy=3, seed=125)
    space = GridSpaceSingle(griddims; periodic=false)
    properties = Dict(:min_to_be_happy => min_to_be_happy)
    rng = Random.Xoshiro(seed)
    model = UnremovableABM(SchellingAgent, space;
                           properties, rng, scheduler=Schedulers.Randomly())
    # populate the model with agents, adding equal amount of the two types of agents
    # at random positions in the model
    for n in 1:total_agents
        agent = SchellingAgent(n, (1, 1), false, n < total_agents / 2 ? 1 : 2)
        add_agent_single!(agent, model)
    end
    return model
end

# The issue also appears if we keep the model fixed for every iteration of the
# for loop. Memory usage is unchanged.
# model = initialize()

n_steps = 1e3 |> Int
n_simulations = 500
# If backend = :none, all data is saved to memory. Use this to compare memory
# usage with the other backends.
backend = :csv  # The issue also occurs with the :arrow backend
adata = [:pos, :mood, :group]

data_path = "mwe_offline_run"
mkpath(data_path)
run_message = "Running $n_steps * $n_simulations simulations with $backend backend"
@time begin
    if backend == :none
        dfs = []
        ProgressMeter.@showprogress run_message for i in 1:n_simulations
            model = initialize()
            df = Agents.run!(model, agent_step!, Agents.dummystep, n_steps;
                             adata=adata)
            push!(dfs, df)
        end
    else
        ProgressMeter.@showprogress run_message for i in 1:n_simulations
            model = initialize()
            Agents.offline_run!(model, agent_step!, Agents.dummystep, n_steps;
                                adata=adata,
                                backend=backend,
                                writing_interval=1e4 |> Int,
                                adata_filename="$data_path/adata.$backend")
        end
    end
end
if backend != :none
    println("Adata File size: ", get_file_size_in_gb("$data_path/adata.$backend"), " GB")
end
println("Finished $n_steps * $n_simulations simulations with $backend backend")

end

Software/Hardware:

  • Linux, Ubuntu 22.04
  • Julia 1.9.4
  • Agents 5.17
  • 16GB RAM (if you have less RAM than this, you might want to reduce n_simulations)

Attempts Made:

  • I conducted a series of tests to understand the memory usage pattern. Here are my observations:
    1. Single Simulation with Many Steps: When n_simulations = 1 and n_steps is large, memory usage is effectively managed by Agents.offline_run. This function seems to control memory by calling empty! on the in-memory dataframe after writing to disk.
    2. Multiple Simulations: However, when n_simulations is large, Agents.offline_run no longer controls memory usage effectively. Memory usage increases with each iteration of the for loop and is not released afterwards. Surprisingly, the memory usage greatly exceeds (~2x) the size of the adata file on disk.
    3. Manual Garbage Collection: I found that manually invoking the garbage collector with GC.gc() after the entire for loop has finished reduces the memory usage to approximately the size of the adata file on disk. Calling GC.gc() within the for loop controls memory usage but at a significant hit to performance. I’d prefer to avoid forcing manual garbage collection in performance critical code unless absolutely necessary.
    4. Search for relevant issues I have searched the Agents.jl docs and GitHub issues for anything relevant. It appears that offline_run! is a relatively new addition to the package. Internally, it appears to be adding data to dataframes, saving the data to file, and then emptying those dataframes. There is a chance that it may be useful later to post this as a GitHub issue, but I felt it prudent to check in with the Julia community to see if my issue is due to a more general misunderstanding.

Given these observations, I’m curious about the following:

Questions:

  1. Why does Agents.offline_run manage memory effectively in single simulations but not in multiple simulations?
  2. Is there a recommended approach to ensure memory efficiency when running a for loop where data is created and saved inside each iteration?
  3. Could this behaviour be indicative of a memory leak or inefficient memory management within the offline_run function, and how might I investigate this further?

Thank you in advance for any guidance or suggestions. Your expertise and time are greatly appreciated!

2 Likes

Indeed I tried to run your MWE and calling GC.gc() inside the loop solves the issue. This to me indicates that probably the issue is related more to the internal handling of garbage collection in Julia, than a particular problem with Agents.jl itself, but I could be wrong. This seems somewhat similar to Different memory usage under Windows and linux? in spirit, so I would suggest to try it out on 1.10-rc2. Maybe it will magically work better as in that other thread :smile:

3 Likes

Tried it on 1.10-rc2 and…it works like a charm! No memory increase at all on your MWE

3 Likes

Thanks so much for the suggestion!

Indeed, this fixes the memory issue on Linux for both the MWE and my actual code. Thanks for finding that thread. :slight_smile:

1 Like