[solved] Agents.jl spending 3/4 of its time doing something before it even starts to compute?

I’m running a simulation with 10000 agents in a continuous space. Even 4 time steps takes minutes… Netlogo can do a similar thing in milliseconds.

So of course I hit the profiler:


  131╎131   @Base/dict.jl:279; ht_keyindex(::Dict{Symbol,Int64}, ::Symbol)
     ╎13763 @Base/task.jl:358; (::REPL.var"#26#27"{REPL.REPLBackend})()
     ╎ 13763 .../julia/stdlib/v1.4/REPL/src/REPL.jl:118; macro expansion
     ╎  13763 .../julia/stdlib/v1.4/REPL/src/REPL.jl:86; eval_user_input(::Any, ::REPL.REPLBackend)
     ╎   13763 @Base/boot.jl:331; eval(::Module, ::Any)
     ╎    13763 @Revise/src/Revise.jl:1165; run_backend(::REPL.REPLBackend)
     ╎     13763 ...julia/stdlib/v1.4/REPL/src/REPL.jl:86; eval_user_input(::Any, ::REPL.REPLBackend)
     ╎42450 @Base/task.jl:444; task_done_hook(::Task)
     ╎ 42450 @Base/task.jl:709; wait
42449╎  42450 @Base/task.jl:702; poptaskref(::Base.InvasiveLinkedListSynchronized{Task})
     ╎1     @Agents/src/simulations/collect.jl:62; (::Agents.var"#run!##kw")(::NamedTuple{(:adata, :mdata)...
     ╎ 1     @Agents/src/simulations/collect.jl:70; #run!#96
     ╎  1     @Agents/src/simulations/collect.jl:86; (::Agents.var"#_run!##kw")(::NamedTuple{(:adata, :mdata...
     ╎   1     @Agents/src/simulations/collect.jl:103; _run!(::AgentBasedModel{COVPerson,ContinuousSpace{type...
     ╎    1     @Agents/src/simulations/step.jl:44; step!(::AgentBasedModel{COVPerson,ContinuousSpace{type...
     ╎     1     REPL[130]:31; stepcovmodel!(::AgentBasedModel{COVPerson,ContinuousS...
     ╎1     @Agents/src/simulations/step.jl:44; step!(::AgentBasedModel{COVPerson,ContinuousSpace{typeo...
    1╎ 1     @Base/array.jl:678; _collect(::SubArray{COVPerson,1,Array{COVPerson,1},Tupl...
Total snapshots: 70378

I did 4 timesteps, and 10k agents, so the fact that it calls poptaskref 42449 times suggests to me that for every timestep, it pushes a task for every agent… This is incredibly inefficient and it’s spending all its time pushing and popping tasks… Of course I could be wrong.

It seems clear that a bunch of time is spent doing something before my calculation takes hold… I think this may be Revise or something? I’m not sure…

Here’s a profile with only 1000 agents… the big bar along the bottom is calls to poptaskref

Ok, I’m beginning to think that the lowest level calls are actually something to do with the repl/Revise… I’m still not sure why so many samples have nothing above them… All the “actual computing” comes in the last 25% of samples?

In any case, I think I’m on my way to understanding what’s making this slow. Seems like there’s a LOT of allocations, related to the way that I check for interactions between agents.

1 Like

Did you try to profile without Revise?

1 Like

I will try that today. Also I am running this in a julia-repl inside emacs. not sure if that has something to do with it. I may try from regular command line

I did find that things I absolutely expected would lower the allocations increased them by 10x… so I’m pretty confused at the moment.

to elaborate on that… I had

any(map(somefn,anarray))

and I converted that to a loop over anarray breaking when the first time somefn returned true…

this caused my allocations to go from 1G to 10G according to benchmark…

WTH?

Here is the run outside of emacs… which was easy to do and shows that the emacs stuff isn’t the issue:

And here’s after I moved my julia startup file so it wouldn’t use revise:

So, it seems that something causes 75% of my time to be used doing… nothing

and by the way this takes several seconds to run, so it’s not like the reason is that it takes a short time and some fixed few hundred milliseconds of REPL is the reason.

Here’s output from benchmarking the run! function:

BenchmarkTools.Trial: 
  memory estimate:  1.34 GiB
  allocs estimate:  32028344
  --------------
  minimum time:     6.817 s (3.54% GC)
  median time:      6.817 s (3.54% GC)
  mean time:        6.817 s (3.54% GC)
  maximum time:     6.817 s (3.54% GC)
  --------------
  samples:          1
  evals/sample:     1

FYI, you can do any(somefn, anarray), which removes the need for the temporary in any(map(somefn,anarray))

2 Likes

Awesome, that cut my allocations in half:

BenchmarkTools.Trial: 
  memory estimate:  546.40 MiB
  allocs estimate:  15828344
  --------------
  minimum time:     6.110 s (1.35% GC)
  median time:      6.110 s (1.35% GC)
  mean time:        6.110 s (1.35% GC)
  maximum time:     6.110 s (1.35% GC)
  --------------
  samples:          1
  evals/sample:     1

However, as you can see, GC time is now trivial, so I’d better figure out where the heck the rest of the time is going!

AHA!

It was a bug, I was doing some code that does something for a big subset of the agents once for every agent in the model instead of just once…

new timing:

  memory estimate:  77.36 MiB
  allocs estimate:  1358768
  --------------
  minimum time:     323.610 ms (1.81% GC)
  median time:      331.277 ms (2.46% GC)
  mean time:        331.999 ms (2.77% GC)
  maximum time:     343.014 ms (4.23% GC)
  --------------
  samples:          16
  evals/sample:     1


I still don’t understand why most of the samples are coming from before my calculation starts… But hey, for the moment I’m not gonna worry about it and assume it’s some kind of overhead of running the profiler.

2 Likes

I don’t think that’s correct, but you would have to share more for us to help.

Yeah, I don’t know either… Here’s two profile views one of which with 100 steps of my model one is 400:

As you can see it doesn’t matter how long I run the thing it’s always something like 25 % of the samples are in my code.

Now that I think about it, I’ve got 4 julia threads in my environment variable, so maybe it’s sampling all 4 threads, and of course 3/4 of them are not doing anything, because I’m not using threads in my code.

That must be it?

Yeah that’s probably it. The julia profiler seems to do some funny things when julia is launched with multiple threads. See the following thread for a related example:

3 Likes

It shouldn’t, that’s a plain vanilla julia process in the inferior buffer.

1 Like

Good to know. it seems clear this is a threads issue as my plot looks just like the plots from the linked topic.