I’m running pmap with 16 workers and
batch_size=1 to process hundreds of files. Each file takes over an hour to work through and the files are relatively homogeneous but don’t take exactly the same time. For the first 14 hours or so the computer was using 16 cores at 100% usage. However, at some time in the last few hours the number of fully used cpus dropped to 8. The other 8 processes are still alive, they’re just shown as sleeping.
When I check the logs I find that the other (not-working) workers finished processing their last assigned file without error. However, they didn’t start on the next file.
Any ideas why this would happen? If no error occured, why wouldn’t a worker load the next task?
Actually, now they’re down to 6 workers.