HTTP server with parallelism

This question relates a bit to HTTP.jl doesn't seem to be good at handling over 1k concurrent requests, in comparison to an alternative in Python?:

I’m interested in creating a HTTP server that executes CPU-heavy tasks. Assume it’s a fractal service that takes a floating-point argument and computes a costly image.

The workload leads itself to parallelism with multiple threads, and it shouldn’t be necessary to use Distributed. On the other hand, it’s not something we can do only with @async. How can we best approach this task?

I was thinking I might need to start servers with multiple threads, like the linked post suggests. But now I’m finally coming to understand that’s not really my need. I’m not worried about parallelizing the requests themselves, only the underlying workload. A single server, with a single thread should suffice to handle the requests. I just need to be able to fire up these tasks in separate threads (it’s not just an IO operation that can be handled concurrently, it’s CPU-heavy).

I’m thinking I could have all of my API endpoint functions do their work inside of a Threads.@spawn begin end block, and then return a fetch(response). Does that sound like a good approach? Could this even be accomplished in a generic way, similar to the JSONHandler in the documentation?

This example package which was discussed in this JuliaCon workshop has a Workers module which keeps the main thread for the HTTP serving and uses other threads for the tasks, which I think does something similar to what you’re asking.

The Workers module was actually then put into the WorkerUtilities.jl package. It works by introducing a new @async macro, as per this small example of a route:

pickAlbumToListen(req) = fetch(Workers.@async(Service.pickAlbumToListen()::Album))
HTTP.@register(ROUTER, "GET", "/", pickAlbumToListen)

Another relevant packages is ThreadPools