IO Parallelism Strategies

I have a use case with many small IOs (get/put S3 objects <300mb), a heartbeat (keep SQS message from timing out), and a heavy compute task which consumes a constant stream of inputs. The machine I’m using has 2 cores.

Recognizing that the real answer is “profile it”, what is your intuition for the best parallelism strategy? There are so many knobs to turn, and so much randomness created by the networking that I don’t think approaching optimization with pure experimentation is going to be fruitful.

I am trying to achieve:

  1. The compute task is never waiting for more inputs or for the results to be put back to S3
  2. The heartbeat isn’t blocked for so long that it fails to update the timeout.

One thing that is clear is that the main loop should kick off tasks for IO and heartbeat so that that they do not block execution of the heavy compute. However, there are several consideration for this.

  1. How do I make sure the heart beat isn’t blocked for a long time by a long list of IO tasks?
  2. I’ve seen (maybe outdated) posts that libuv poses some fundamental challenges to concurrent IO because of its global lock. AWS.jl suggests switching to the Downloads.jl backend (instead of HTTP.jl) if you need to do concurrent requests. Unfortunately, Curl.jl is segfaulting for me periodically, so I’m still using HTTP.jl until the patch mentioned in that issue makes it’s way to me. I’m not sure how this should affect my strategy (maybe @async is better than @spawn for this case? Would the advice change once I can move to Downloads.jl?)
  3. I know that, under most circumstances, It is not advisable to have more Julia threads than logical cores. However, in this case where I have an ongoing heart beat that is very light weight but also needs to not be blocked, maybe I should run Julia with one :interactive and two :default threads and only schedule the heart beat on the interactive thread.

My current idea is to @spawn the heart beats to an :interactive thread, spawn an IO manager task which @async schedules all IO operations and leave the heavy compute on the main thread. But I’m open to the idea that I’m overthinking this and I should just @spawn everything naively.

1 Like

We tried this for exactly this use-case, heartbeat bumping a SQS timeout, and I think it helped but did not totally solve task starvation issues. Maybe @omus or @dave.f.kleinschmidt remembers more; I think maybe we just bumped the timeout amount so there’s more leeway.

I think this is stale, I just filed rm stale notes about HTTP.jl concurrency issues by ericphanson · Pull Request #694 · JuliaCloud/AWS.jl · GitHub to update those docs; that was only true pre-HTTP.jl 1.0 I believe.

1 Like

When you say you tried this, were you also using one more Julia threads than you had logical cores in your machine?

And, thanks for the update about HTTP.jl! That’s great news!

No, using n regular threads + 1 interactive thread on a n+1 core machine, so the total Julia threads matches the number of cores

1 Like