What's the point of `ntasks` in `asyncmap`

jling · May 27, 2023, 6:00pm

julia> function f(_...)
           s = 0.0
           for i = 1:10^9
               s = max(cos(i), sin(i))
           end
           return s
       end
f (generic function with 2 methods)

julia> asyncmap(f, 1:2)

I always assumed asyncmap never uses more than 100% * 1 CPU thread, and the above example indeed never exceed 100% when looking at top command.

But the documentation is IMO confusion:

asyncmap(f, c…; ntasks=0, batch_size=nothing)
ntasks specifies the number of tasks to run concurrently. Depending on the length of the collections, if ntasks is unspecified, up to 100 tasks will be used for concurrent mapping.
ntasks can also be specified as a zero-arg function. In this case, the number of tasks to run in parallel is checked before processing every element and a new task started if the value of ntasks_func is greater than the current number of tasks.

I suspect the problem is that the word “concurrent” here is used under a super specific / narrow definition. Because this last bit of the doc actually describes the “will ever only use 100% * 1 CPU”

Currently, all tasks in Julia are executed in a single OS thread co-operatively. Consequently, asyncmap is beneficial only when the mapping function involves any I/O - disk, network, remote worker invocation, etc.

My “question” is, can we make the doc better? and what’s the point of having N tasks if they are all executed on a single OS thread anyway

ericphanson · May 27, 2023, 6:08pm

its useful for IO, e.g. downloading a bunch of files or such, since julia can switch tasks while waiting on the network. You do want some control over the number of tasks because if there are too many, it’s possible there will be long delays between when the cpu can attend to a given task, which can cause errors, e.g. AWSException: RequestTimeTooSkewed - s3.put_object · Issue #598 · JuliaCloud/AWS.jl · GitHub

jling · May 27, 2023, 6:08pm

@mkitti

It runs the blocking operation on a different thread, it blocks that thread and then returns the result to the launching thread

ah right…this is for task-local libUV thread…ugh

aplavin · May 27, 2023, 9:19pm

For a specific example, where ntasks are very important, I quote myself from slack:

Try asyncmap(_ -> run(`ls`), 1:10^4) or with something longer-running than ls: it starts ntasks processes at the same time.

Topic		Replies	Views
Parallel Good Practice Julia at Scale	22	3949	November 30, 2018
Questions on parallel programming terminology Julia at Scale question , parallel , distributed , threads	7	2016	May 8, 2020
How to execute tasks in parallel in a for loop Performance parallel , multithreading , juliapro , optimization	27	2018	November 29, 2023
Run external commands in parallel Julia at Scale question	10	1855	February 7, 2019
Limited parallel downloads Web Stack	7	527	November 8, 2021

What's the point of `ntasks` in `asyncmap`

Related topics