Hello Julia Community,
I’m currently working with the Agents.jl package, specifically using the offline_run
function. I’ve encountered an issue where my program seems to be consuming a significant amount of memory when calling offline_run
within a for loop.
Issue Description:
- Function Used:
offline_run
from Agents.jl. - Observed Problem: High memory usage when calling function in a for loop.
- Context: I want to run my model for many different parameter values. The resulting dataframe will not fit in memory, so I want to write the data to file while the model simulations run. I expected memory usage not to exceed what is needed for an individual simulation, as all iterations of the for loop should be independent. However, my memory usage continues to rise and exceeds the size of the full dataframe. Below, I provided a MWE of the issue, which uses the Schelling model defined in the Agents.jl documentation.
Code Snippet:
module MWERunOffline
# We make use of the following packages in this script
using Agents
using Random
using ProgressMeter # Issue has been replicated without ProgressMeter; it's nice to know how long the for loop might take
"A utility function to get the size of a file in GB"
function get_file_size_in_gb(filename)
size_in_bytes = filesize(filename)
size_in_gb = size_in_bytes / (1024^3)
return size_in_gb
end
# Below, I use the Schelling model defined in the Agents.jl documentation
# https://juliadynamics.github.io/Agents.jl/stable/examples/schelling
@agent SchellingAgent GridAgent{2} begin
mood::Bool # whether the agent is happy in its position. (true = happy)
group::Int # The group of the agent, determines mood as it interacts with neighbors
end
function agent_step!(agent, model)
minhappy = model.min_to_be_happy
count_neighbors_same_group = 0
# For each neighbor, get group and compare to current agent's group
# and increment `count_neighbors_same_group` as appropriately.
# Here `nearby_agents` (with default arguments) will provide an iterator
# over the nearby agents one grid point away, which are at most 8.
for neighbor in nearby_agents(agent, model)
if agent.group == neighbor.group
count_neighbors_same_group += 1
end
end
# After counting the neighbors, decide whether or not to move the agent.
# If count_neighbors_same_group is at least the min_to_be_happy, set the
# mood to true. Otherwise, move the agent to a random position, and set
# mood to false.
if count_neighbors_same_group ≥ minhappy
agent.mood = true
else
agent.mood = false
move_agent_single!(agent, model)
end
return
end
function initialize(; total_agents=320, griddims=(20, 20), min_to_be_happy=3, seed=125)
space = GridSpaceSingle(griddims; periodic=false)
properties = Dict(:min_to_be_happy => min_to_be_happy)
rng = Random.Xoshiro(seed)
model = UnremovableABM(SchellingAgent, space;
properties, rng, scheduler=Schedulers.Randomly())
# populate the model with agents, adding equal amount of the two types of agents
# at random positions in the model
for n in 1:total_agents
agent = SchellingAgent(n, (1, 1), false, n < total_agents / 2 ? 1 : 2)
add_agent_single!(agent, model)
end
return model
end
# The issue also appears if we keep the model fixed for every iteration of the
# for loop. Memory usage is unchanged.
# model = initialize()
n_steps = 1e3 |> Int
n_simulations = 500
# If backend = :none, all data is saved to memory. Use this to compare memory
# usage with the other backends.
backend = :csv # The issue also occurs with the :arrow backend
adata = [:pos, :mood, :group]
data_path = "mwe_offline_run"
mkpath(data_path)
run_message = "Running $n_steps * $n_simulations simulations with $backend backend"
@time begin
if backend == :none
dfs = []
ProgressMeter.@showprogress run_message for i in 1:n_simulations
model = initialize()
df = Agents.run!(model, agent_step!, Agents.dummystep, n_steps;
adata=adata)
push!(dfs, df)
end
else
ProgressMeter.@showprogress run_message for i in 1:n_simulations
model = initialize()
Agents.offline_run!(model, agent_step!, Agents.dummystep, n_steps;
adata=adata,
backend=backend,
writing_interval=1e4 |> Int,
adata_filename="$data_path/adata.$backend")
end
end
end
if backend != :none
println("Adata File size: ", get_file_size_in_gb("$data_path/adata.$backend"), " GB")
end
println("Finished $n_steps * $n_simulations simulations with $backend backend")
end
Software/Hardware:
- Linux, Ubuntu 22.04
- Julia 1.9.4
- Agents 5.17
- 16GB RAM (if you have less RAM than this, you might want to reduce
n_simulations
)
Attempts Made:
- I conducted a series of tests to understand the memory usage pattern. Here are my observations:
- Single Simulation with Many Steps: When
n_simulations = 1
andn_steps
is large, memory usage is effectively managed byAgents.offline_run
. This function seems to control memory by callingempty!
on the in-memory dataframe after writing to disk. - Multiple Simulations: However, when
n_simulations
is large,Agents.offline_run
no longer controls memory usage effectively. Memory usage increases with each iteration of the for loop and is not released afterwards. Surprisingly, the memory usage greatly exceeds (~2x) the size of the adata file on disk. - Manual Garbage Collection: I found that manually invoking the garbage collector with
GC.gc()
after the entire for loop has finished reduces the memory usage to approximately the size of the adata file on disk. Calling GC.gc() within the for loop controls memory usage but at a significant hit to performance. I’d prefer to avoid forcing manual garbage collection in performance critical code unless absolutely necessary. - Search for relevant issues I have searched the Agents.jl docs and GitHub issues for anything relevant. It appears that
offline_run!
is a relatively new addition to the package. Internally, it appears to be adding data to dataframes, saving the data to file, and then emptying those dataframes. There is a chance that it may be useful later to post this as a GitHub issue, but I felt it prudent to check in with the Julia community to see if my issue is due to a more general misunderstanding.
- Single Simulation with Many Steps: When
Given these observations, I’m curious about the following:
Questions:
- Why does
Agents.offline_run
manage memory effectively in single simulations but not in multiple simulations? - Is there a recommended approach to ensure memory efficiency when running a for loop where data is created and saved inside each iteration?
- Could this behaviour be indicative of a memory leak or inefficient memory management within the
offline_run
function, and how might I investigate this further?
Thank you in advance for any guidance or suggestions. Your expertise and time are greatly appreciated!