Multithreading in Agents.jl

Hello everyone,

Im trying to use multithreading in EventBasedModel. Im getting this error while doing this.

Can someone comment ? What can be issue? What is this error trying to say?


        Threads.@threads for agent in allagents(model)

            model_rule!(agent, model)      
            
        end

 nested task error: MethodError: no method matching firstindex(::Base.ValueIterator{Dict{Int64, CTL}})
            
            Closest candidates are:
              firstindex(::Any, ::Any)
               @ Base abstractarray.jl:447
              firstindex(::Markdown.MD)
               @ Markdown ~/julia-1.10.0/share/julia/stdlib/v1.10/Markdown/src/parse/parse.jl:27
              firstindex(::TranscodingStreams.Memory)
               @ TranscodingStreams ~/.julia/packages/TranscodingStreams/Ra4ZD/src/memory.jl:18
              ...
            
            Stacktrace:
             [1] #398#threadsfor_fun#88
               @ Main ./threadingconstructs.jl:199 [inlined]
             [2] #398#threadsfor_fun
               @ Main ./threadingconstructs.jl:181 [inlined]
             [3] (::Base.Threads.var"#1#2"{var"#398#threadsfor_fun#89"{var"#398#threadsfor_fun#88#90"{…}}, Int64})()
               @ Base.Threads ./threadingconstructs.jl:153
        
        ...and 63 more exceptions.

Not a big Agents user or Turing for that matter, but I recall from earlier discussions that you had some agents models with millions of allocations and seemingly lots of room for optimisation. Have you sorted these issues? Because if not you’re wasting time trying to get multithreading to work or (as in your other recent thread) developing workarounds for the fact that it takes 2 hours to get a single mcmc sample from your model.

3 Likes

Not too familiar with Agents but I don’t think the package is related to your error. That type of error shows up when the iterator doesn’t have a definition of firstindex so the threading doesn’t really know what to do with it. e.g. (a contrived example)

julia> @threads for i in Ref(2)
       end
ERROR: TaskFailedException

    nested task error: MethodError: no method matching firstindex(::Base.RefValue{Int64})

    Closest candidates are:
      firstindex(::Any, ::Any)
       @ Base abstractarray.jl:450
      firstindex(::Base.JuliaSyntax.SourceFile)
       @ Base C:\workdir\base\JuliaSyntax\src\source_files.jl:131
      firstindex(::Markdown.MD)
       @ Markdown C:\Users\danjv\.julia\juliaup\julia-1.10.4+0.x64.w64.mingw32\share\julia\stdlib\v1.10\Markdown\src\parse\parse.jl:27
      ...

    Stacktrace:
     [1] #67#threadsfor_fun#13
       @ .\threadingconstructs.jl:200 [inlined]
     [2] #67#threadsfor_fun
       @ .\threadingconstructs.jl:182 [inlined]
     [3] (::Base.Threads.var"#1#2"{var"#67#threadsfor_fun#14"{var"#67#threadsfor_fun#13#15"{Base.RefValue{Int64}}}, Int64})()
       @ Base.Threads .\threadingconstructs.jl:154
Stacktrace:
 [1] threading_run(fun::var"#67#threadsfor_fun#14"{var"#67#threadsfor_fun#13#15"{Base.RefValue{Int64}}}, static::Bool)
   @ Base.Threads .\threadingconstructs.jl:172
 [2] macro expansion
   @ .\threadingconstructs.jl:220 [inlined]
 [3] top-level scope
   @ .\REPL[10]:1

For cases like these, you could just wrap the iterator in collect (meaning do collect(allagents(model)). There are some other packages that get around this without needing to use collect but I don’t remember them right now

2 Likes

Even though your specific problem is due to what @DanielVandH is saying, I’d like to emphasize that unfortunately Agents.jl doesn’t support multithreading in most of the cases. In practice only when there is no interaction between agents. If well-optimized an EventBasedModel should be plenty fast even if single-threaded though.

2 Likes

I worked on that but it didnt improve alot. For speed of simulation I gain some time as compare to last time but this is still not enough I want this time to be more small. Though I feel like Im not sure what to change more and thats why I was looking for this multithreading :sweat_smile:. But I understand what are you saying, I will try to improve my code more.

The issue is i’m collecting alot of data for different purposes while model run and this is leading to allocations and at some places I have no choice but to use list Any[] which I know not a good thing for allocations and speed.

Also for my model run it seems most time is going in gc.

This is my current time for simulation.

julia> for i in 1:10
       
           @btime step!(model, 16) 
       end
  711.812 ms (928621 allocations: 44.22 MiB)
  787.768 ms (1209243 allocations: 60.27 MiB)
  801.701 ms (1358378 allocations: 68.66 MiB)
  840.247 ms (1423912 allocations: 72.31 MiB)
  815.020 ms (1424618 allocations: 72.40 MiB)
  845.164 ms (1467972 allocations: 74.80 MiB)
  847.029 ms (1467705 allocations: 74.77 MiB)
  829.571 ms (1468174 allocations: 74.79 MiB)
  827.787 ms (1466873 allocations: 74.68 MiB)
  848.454 ms (1465829 allocations: 74.62 MiB)

There is so much difference with @time

 for i in 1:10
       
           @time step!(model, 16) 
       end
  0.638552 seconds (529.23 k allocations: 22.324 MiB)
  0.680958 seconds (613.17 k allocations: 26.718 MiB)
  0.692862 seconds (620.84 k allocations: 26.728 MiB)
  0.688389 seconds (652.43 k allocations: 29.057 MiB)
  0.755934 seconds (686.05 k allocations: 30.381 MiB)
  0.722687 seconds (736.99 k allocations: 33.458 MiB)
  0.692135 seconds (746.41 k allocations: 33.808 MiB)
  0.945384 seconds (775.51 k allocations: 36.576 MiB, 23.09% gc time)
  0.715132 seconds (816.81 k allocations: 37.967 MiB)
  0.720824 seconds (838.70 k allocations: 39.175 MiB)

quick question…how much allocations are sensible for a complex model in general? Like Millions seems like not but how much you will say is good enough? under a million or it really depends how much one can optimise. I mean if one wants, code can be written in a way that its allocations are in kB even if model is complex.

That’s really not a question that can sensibly be answered without knowing the algorithm you are implementing. 1.4 million allocations certainly lead to a strong suspicion that there’s a lot of room for improvement but maybe this is the rare case of a model that needs them? (I would put the chance of that at less than 1% based on pure gut feel and your limited descriptions).

Another important thing to consider is that excessive allocations are especially problematic for multithreaded code and can at times lead to programmes running slower when multithreaded than in serial execution. We often see people on here reaching for the multithreading silver bullet to get around the hard(ish) work of optimising single thread performance, but without doing that work that bullet (if I may slightly mix and abuse metaphors at the same time) can turn out to sit in the chamber of a foot gun.

So my advice is: profile your code to work out where the allocations happen. Then think of ways to remove them. If you can’t, reduce the bit in your model that produces the excessive allocations into a Minimal Working Example (I’d say no more than 50 lines, the fewer the better) and open a separate thread on here asking for help with optimising performance.

1 Like

I ran profile for my code and it turns out that there a specific places where large allocations are dedicated.

In one of place i have situation like example below.

Example:

function func(agent, model)
   model.x += 1

   if model.x == nagents(model)
        ---do stuff ---
   end
end

now here this function go through all agents for every time point which means that model.x is getting allocated for all these time. This is an important part in code as I want to run few things only once at a time point (like model step in ABM model which is not available in EBM Model). Is there a way to avoid these allocations here ?

Also above problem seems similar to this issue.

julia> c = 1
1

julia> @time for i in 1:10000000
       
           c = i
       
       end
  0.116305 seconds (10.00 M allocations: 152.580 MiB, 21.17% gc time)

Next major allocations comes from push!(). See below for idea.


function data_extraction(agent, model)
    if condition1

        push!(model.a, [agent.var1, agent.var2, agent.var3])   # where model.a is a Vector{Vector{Float64}}[]
    end

    if condition2

        push!(model.a, [agent.var1, agent.var2, agent.var3])   # where model.a is a Vector{Vector{Float64}}[]
    end

    if condition3

        push!(model.a, [agent.var1, agent.var2, agent.var3])   # where model.a is a Vector{Vector{Float64}}[]
    end

    if condition4

        push!(model.a, [agent.var1, agent.var2, agent.var3])   # where model.a is a Vector{Vector{Float64}}[]
    end

    if condition5

        push!(model.a, [agent.var1, agent.var2, agent.var3])   # where model.a is a Vector{Vector{Float64}}[]
    end
end

This is the data collection that I was talking before in one of replies. This function also goes through all agents for each time point hence allocations.

Any ideas to avoid allocations in here will be helpful :slight_smile:

Again I think you need to spend some thing to get MWEs together and start dedicated threads for those.

Here in the first example you just increment an integer counter which won’t allocate, so presumably the allocations are in the β€œdo stuff” part which is missing. In the second example you might have simplified too much (all the code is the same, so why check the conditions at all?), but the way to avoid allocations from push! is to preallocate, something like

model.a = [zeros(4) for _ in 1:n_agents]

And then dot assign (.=) to the elements of model.a.

1 Like

Yeah Im working on MWE. :sweat_smile:

In first example I missed this. I also update model.x inside if condition.

function func(agent, model)
   model.x += 1

   if model.x == nagents(model)
        ---do stuff ---
        model.x = 0                ## Reset x
   end
end

For second one you meant like this ? because its allocating i think

julia> aa = [zeros(4) for _ in 1:160000]
160000-element Vector{Vector{Float64}}:
 [0.0, 0.0, 0.0, 0.0]
 [0.0, 0.0, 0.0, 0.0]
 [0.0, 0.0, 0.0, 0.0]
 .
 .
 [0.0, 0.0, 0.0, 0.0]

julia> @time for i in 1:160000
         aa[i] .= [1, 1, 1, 1] 
       end
  0.077297 seconds (479.49 k allocations: 19.523 MiB, 73.25% gc time)

I agree with @nilshg that it’s impossible to really know what the issue is that you’re having, but note that

@time for i in 1:160000
         aa[i] .= [1, 1, 1, 1] 
       end

could be

@time for i in 1:160000
         aa[i] .= (1, 1, 1, 1)
       end

which will get rid of them.

julia> function ex1(aa)
       for i in 1:160000
       aa[i] .= [1, 1, 1, 1]
       end
       end
ex1 (generic function with 1 method)

julia> function ex2(aa)
       for i in 1:160000
       aa[i] .= (1, 1, 1, 1)
       end
       end
ex2 (generic function with 1 method)

julia> using BenchmarkTools

julia> @benchmark ex1($aa)
BenchmarkTools.Trial: 842 samples with 1 evaluation.
 Range (min … max):  3.575 ms … 161.583 ms  β”Š GC (min … max):  0.00% … 93.63%
 Time  (median):     5.540 ms               β”Š GC (median):     0.00%
 Time  (mean Β± Οƒ):   5.918 ms Β±   5.723 ms  β”Š GC (mean Β± Οƒ):  11.47% Β± 12.36%

   β–ˆβ–ƒβ–
  β–†β–ˆβ–ˆβ–ˆβ–†β–„β–ƒβ–ƒβ–ƒβ–ƒβ–‚β–ƒβ–‚β–ƒβ–ƒβ–ƒβ–„β–ƒβ–„β–…β–„β–…β–…β–„β–…β–„β–„β–„β–„β–…β–…β–„β–ƒβ–ƒβ–ƒβ–‚β–‚β–‚β–‚β–‚β–ƒβ–‚β–β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–‚β–β–β–β–β–‚β–‚ β–ƒ
  3.58 ms         Histogram: frequency by time        12.1 ms <

 Memory estimate: 14.65 MiB, allocs estimate: 160000.

julia> @benchmark ex2($aa)
BenchmarkTools.Trial: 5279 samples with 1 evaluation.
 Range (min … max):  717.100 ΞΌs …   2.393 ms  β”Š GC (min … max): 0.00% … 0.00%
 Time  (median):     869.900 ΞΌs               β”Š GC (median):    0.00%
 Time  (mean Β± Οƒ):   937.408 ΞΌs Β± 195.761 ΞΌs  β”Š GC (mean Β± Οƒ):  0.00% Β± 0.00%

     β–ƒβ–‡β–ˆβ–„β–β–‚β–‚ ▁
  β–β–ƒβ–†β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–‡β–†β–…β–…β–…β–„β–ƒβ–ƒβ–ƒβ–ƒβ–„β–ƒβ–ƒβ–ƒβ–ƒβ–ƒβ–‚β–‚β–ƒβ–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–‚β–β–β–‚β–‚β–β–β–β–‚β–β–β–β–β–β–β–β–β–β– β–ƒ
  717 ΞΌs           Histogram: frequency by time         1.63 ms <

 Memory estimate: 0 bytes, allocs estimate: 0.
2 Likes

Here is MWE. I tried to get sections which are causing issues in the example.