Parallel computing; show progress

I have a large number of files to process using pmap() to parallel the calculation. It takes hours to finish. How can I get the percentage of the job?

# ...
function process_file(file_path)
       ; #process the file
end

file_paths = get_files()
rst = pmap(process_file, file_paths)   # block and waited here 

# want to show the progress here 

Thanks!

Look at GitHub - timholy/ProgressMeter.jl: Progress meter for long-running computations maybe it’s possible.

The main problem is that it is blocked and waited when call pmap()

I have used a counter to report the number of the files finished. But still not exactly want I want.

# ...
function process_file(file_path)
       ; #process the file
        
        global  g_n_complete += 1 
        (g_n_complete % 100 == 0 ) && println("worker $myid() finished: ", g_n_complete)
end

file_paths = get_files()
@everywhere global g_n_error
@everywhere global g_n_complete
rst = pmap(process_file, file_paths)   # block and waited here 

# want to show the progress here

Thanks for you reply. I have checked the package, but i am not sure how to used it in the parallel computing. When call pmap(), It waits for the return value.

You can try the following

const count = RemoteChannel(()->Channel{Int}(1));
put!(count, 0)

pmap(1:N) do i
    long_computation()
    c = take!(count) + 1
    put!(count, c)
    println("Done with $c jobs")
end

Just wondering … Rather than thinking of this as a parallel computing issue, could it be approached by considering a new type of collection that pmap iterates through? To pmap the collection would be the same, but it would have extra features. For example, this new collection might have some feature that provides a window into how far into the collection it has iterated, etc. Maybe a collection with an iterator that can publish/subscribe or be queried?

It sounds like you are talking about a RemoteChannel
put! and take! serve as publishing and subscribing mechanisms.