unexpected pmap behaviour

dgs · March 4, 2019, 3:12pm

Hello,

I am currently working on a project where I need to match regular expressions against different sequences (strings).
On the one hand, these sequences can vary in size but I can easily fit them on all workers separately (no memory issue here).
On the other hand, I am trying to find the matches of many (~ 10⁶) of regular expressions (“regs” in the example).
While the code is probably not perfect, it runs well with map.
As the number of regexes to match will increases, I will need to parallelize the search.
The problem is when switching from map to pmap :

there seems to be an increase in memory which I cannot explain when calling multiple times the pmap line
the time required by pmap to return the results varies (between ~4.4s and 7.7s on my laptop).
The system monitor indicates that 1 (out of 8) workers takes a lot more time to return than the others.

Here is a simpler version of the code would be something like:

using Distributed: pmap, @everywhere
using Random

seq = randstring(MersenneTwister(3), 'a':'z', 300) # 1 string for the example

@everywhere seq = $seq

@everywhere function getmatches(rs::Array{Regex,1})
    matches = []
    for r=rs
        push!(matches, collect(eachmatch(r,seq,overlap=true)))
    end
    matches
end

regs = repeat([Regex.(string.(collect('a':'z')))],800)

pmap(getmatches,regs,batch_size = 100) # running on 8 cores

this is the version of Julia I am currently using :
Julia Version 1.1.0
Commit 80516ca202* (2019-01-21 21:24 UTC)
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: Intel(R) Core™ i7-6820HQ CPU @ 2.70GHz
WORD_SIZE: 64
LIBM: libopenlibm
LLVM: libLLVM-6.0.1 (ORCJIT, skylake)

I am not sure if its a known issue with pmap or if I am doing something blatantly wrong so any help/suggestions from Julia experts are welcome!

Thanks in advance,

David

Topic		Replies	Views
Weird behavior of pmap General Usage	5	1424	July 2, 2019
Struggling with pmap New to Julia parallel	8	1005	September 5, 2019
Pmap slow compared to map General Usage performance , parallel	11	3045	September 25, 2018
Pmap usage Performance question , parallel	1	363	December 13, 2020
"Textbook" use of pmap but strange execution times Julia at Scale question	0	431	July 26, 2019

unexpected pmap behaviour

Related topics