What's the point of `ntasks` in `asyncmap`

julia> function f(_...)
           s = 0.0
           for i = 1:10^9
               s = max(cos(i), sin(i))
           end
           return s
       end
f (generic function with 2 methods)

julia> asyncmap(f, 1:2)

I always assumed asyncmap never uses more than 100% * 1 CPU thread, and the above example indeed never exceed 100% when looking at top command.

But the documentation is IMO confusion:

asyncmap(f, c…; ntasks=0, batch_size=nothing)
ntasks specifies the number of tasks to run concurrently. Depending on the length of the collections, if ntasks is unspecified, up to 100 tasks will be used for concurrent mapping.
ntasks can also be specified as a zero-arg function. In this case, the number of tasks to run in parallel is checked before processing every element and a new task started if the value of ntasks_func is greater than the current number of tasks.

I suspect the problem is that the word “concurrent” here is used under a super specific / narrow definition. Because this last bit of the doc actually describes the “will ever only use 100% * 1 CPU”

Currently, all tasks in Julia are executed in a single OS thread co-operatively. Consequently, asyncmap is beneficial only when the mapping function involves any I/O - disk, network, remote worker invocation, etc.

My “question” is, can we make the doc better? and what’s the point of having N tasks if they are all executed on a single OS thread anyway

its useful for IO, e.g. downloading a bunch of files or such, since julia can switch tasks while waiting on the network. You do want some control over the number of tasks because if there are too many, it’s possible there will be long delays between when the cpu can attend to a given task, which can cause errors, e.g. AWSException: RequestTimeTooSkewed - s3.put_object · Issue #598 · JuliaCloud/AWS.jl · GitHub

1 Like

@mkitti

It runs the blocking operation on a different thread, it blocks that thread and then returns the result to the launching thread

ah right…this is for task-local libUV thread…ugh

For a specific example, where ntasks are very important, I quote myself from slack:

Try asyncmap(_ -> run(`ls`), 1:10^4) or with something longer-running than ls: it starts ntasks processes at the same time.