How to parallelize something which allocates a lot?

I have a simulation, which needs around 50MB of RAM and 1 second to finish (lets call it simulate(p)).
My problem is that I have to run these simulations around 500 times, which suddenly leads to a slowdown (maybe) due to memory issues.

The simulation is not completely allocation free, since I have to do some complicated parsing in between and even call a bit of eval( Meta.parse(...) ) at unpredictable times. I can clearly see that there is a lot of work for the garbage collector since the memory often drops at regular times.

I use currently a loop like this

# pseudo-code
res = Dict( p => ResultType() for p in 1:500 ) 
Threads.@threads for p in 1:500
   res[p] = simulate(p)
end

(I will try to create a representative minimal example, but it’s a bit tricky to factor out the relevant parts. If you had a similar situation and know tips, that would already help me.)

Main question:

  • Given that the function simulate allocates a lot, what is the best strategy for parallelization?

Make simulate not allocate as much. Making code run on more threads will have a much smaller speedup, and be harder than making the single threaded version fast.
The specifics are impossible without knowing how simulate works, but eval( Meta.parse(...) ) is almost always an a bad idea.

1 Like

Distributed.jl, which offers pmap and @distributed.

But reducing allocations and removing eval(Meta.parse(...)) will help a lot with both Distributed and @threads.

3 Likes

Thanks for the feedback and hints!

I started experimenting with Distributed today and it seems to be better suited indeed!
In particular, since I can at least remove all workers from time to time and GC.gc() is more effective.

Of course, I absolutely agree that doing with fewer allocations is the best approach! (I already cut down the sequential speed from something like 2 minutes to 1 second.)

About eval(Meta.parse(...)) it is actually just used to parse very simple things, basically expressions like "1 .+ (1.0, 2.0)" or "10 .+ [10.0, 10.0]". I should try to remove that and do the parsing manually…
(Initially, it was just a lazy hack and I hoped that it wouldn’t show up much.)

I guess you were right. I finally found the allocation trap and fixing that solves a lot of problems… Thanks for the input, it helped me to focus on the correct task :smiley:

2 Likes

Why would you do that? Where are these expressions coming from?

3 Likes

It’s probably not interesting, but here you go: My simulation has a lot of biological parameters, and some of those have to be random and/or depend on each other (random for each new born cell during the simulations, i.e. around 5000 random values need to be sampled). At the same time, the simulation will be used by non-programmers, hence, I need to read the input from an XLSX file.
In simplified terms, I get
Dict("A" => "C + (1.0,2.0)", "B" => "A + (3.0,5.0)", "C" => "(-10.0,10.0)")
and need to turn that into
C = rand(Uniform(-10.0,10.0)); A = C + rand(Uniform(1.0,2.0)); B = A + rand(Uniform(3.0,5.0));
Since I was lazy, I just stored the Dict and used eval to do the basic math for me whenever I needed to sample these values.
(But then, plot-twist, we increased the size of the simulations dramatically. Suddenly, this unimportant part of the code got me into big trouble.)

It turned out that 99% of the allocations were spent on occursin(::String,::String) :slight_smile: so, I am fixing that right now. Always a fun adventure to find allocations :smiley:

1 Like

It is always interesting, because I always learn by what others are trying to do and frequently someone has a bright idea on how to do that.

In that case, probably the fancy way to do it is to write your own parser, but I understand that that is complicated indeed, depending on how general the inputs need to be.

I’m a Julia noob so I don’t know if this is possible, but maybe you can convert the dict into a function ahead of time:

function dict2function(input_dict)
    #parse the input dict using Meta.parse
    #combine the expressions
    #return the combined expressions as a function
end

then use the generated function in your simulate function

3 Likes

It is both possible, and a much better approach.

Yes, you are right, I implemented something like that and got that part almost entirely allocation free now.
(It was quite a lot of work, but now my RAM doesn’t explode or slow down! :smiley: )

1 Like