Web server with latency-sensitive and compute-heavy task separate thread pools

I need to set up a web server that mainly serves requests which are computationally heavy and multithreaded, but should remain responsive to health checks (from a Kubernetes cluster) which are latency sensitive. I want to make sure that there is no competition for resources between the two tasks. What is the easiest way to accomplish this with Genie or Oxygen? Or should I go lower level with HTTP.jl?

What Iβ€˜ve tried so far

  • Use TaskPoolEx(background=true) for the parallel loops of the computationally heavy tasks. However, this does not prevent all CPU cores from being grabbed by those tasks while some of them are in the single-theaded stages.
  • Trying to set up Julia 1.9 with one interactive thread and putting `Thread.@spawn :interactiveβ€˜ middleware around the handler function for the latency-sensitive route. However, there were some errors thrown when starting up the Genie server in this thread configuration.


This is what I modify and have running in k8s:
(32) JuliaCon 2020 | Building Microservices and Applications in Julia - YouTube


1 Like

In Julia 1.9 w/ HTTP.jl, we spawn the main server loop on the :interactive threadpool by default. If you then do a Threads.@spawn from a handler, then those threaded tasks should run elsewhere and not in the interactive threadpool, keeping it responsive.

Note, however, that just having a reserved :interactive thread doesn’t guarantee CPU cycles, so you still need to properly configure your running Julia session appropriately (i.e. ensuring non-interactive threads available to run tasks is -1 the # of cores or something).


Thanks. This is reassuring.
Would this behavior be inherited by Genie and/or Oxygen, or do these still need some additional features or special config?

And is there a way of getting guaranteed similar behavior in Julia <1.9, short of using two separate Julia processes with Distributed?

Thanks, I havenβ€˜t found the time yet to watch this but I will do so!

For this kind of behavior pre-1.9, I used to use GitHub - JuliaServices/WorkerUtilities.jl: Utilities for working with multithreaded "workers" in Julia services/applications, with WorkerUtilities.init and WorkerUtilities.@spawn, but that only really works if you control most of the task-spawning in your application. With the number of growing packages that use Threads.@spawn themselves, it can be difficult to reserve cpu use.

1 Like

I see. I tried WorkerUtilities with Genie and my library and I think I sometimes got a nested task error, but I will double check.

I watched the workshop video and it was excellent, Iβ€˜ve learned a lot, thanks a lot! I think that in my case it makes sense to work with HTTP.jl directly to have this additional control over threading.

I like the Workers module there with the explicit loop of threads 2:ntheads() listening to the heavy workload. In my case I have to try and see how that composes with something like TaskPoolEx, such that the parallel portion of the task will just stay on those threads as well, keeping thread 1 for the main server loop. Maybe it makes more sense to have only thread 2 pick tasks off the queue then (the serial portion is negligible and I want to avoid task switching where it is not needed).

So ideally something like this could be set up even in Julia 1.8:

  • Thread 1 runs the main server loop and answers the cheap requests
  • Thread 2 waits for heavy tasks to be put on the queue and then uses threads 2:nthreads() for the parallel portion

Ok, I tried to get a basic MWE of a server which handles low-latency tasks on the interactive thread (where the main server loop is run) and spawns a default thread for the high-latency task.

Here is the server script:

using Dates, HTTP

const ROUTER = HTTP.Router()

function low_latency(req)
    Threads.@spawn :interactive begin # also tried without this
        @info "$(now())> $(Threads.threadpool()) tid: $(Threads.threadid())"
        return "Done with low latency task"
HTTP.register!(ROUTER, "GET", "/low", low_latency)

function high_latency(req)
    Threads.@spawn begin
        s = sum(randn(10^8))
        @info "$(now())> $(Threads.threadpool()) tid: $(Threads.threadid()), sum = $s"
        return "Done with high latency task"
HTTP.register!(ROUTER, "GET", "/high", high_latency)

function requestHandler(req)
    resp = ROUTER(req)
    return HTTP.Response(200, resp)

HTTP.serve(requestHandler, "", 7777)

So I have a couple of questions:

  1. When I run the server with Julia -t N,1 where N>1, I get an error when hitting it with a request: Error: handle_connection handler error, exception = β”‚ AssertionError: 0 < tid <= v
Full stacktrace
β”Œ Error: handle_connection handler error
β”‚   exception =
β”‚    AssertionError: 0 < tid <= v
β”‚    Stacktrace:
β”‚      [1] _length_assert()
β”‚        @ URIs ~/.julia/packages/URIs/6ecRe/src/URIs.jl:694
β”‚      [2] access_threaded(f::typeof(URIs.uri_reference_regex_f), v::Vector{URIs.RegexAndMatchData})
β”‚        @ URIs ~/.julia/packages/URIs/6ecRe/src/URIs.jl:685
β”‚      [3] parse_uri_reference(str::String; strict::Bool)
β”‚        @ URIs ~/.julia/packages/URIs/6ecRe/src/URIs.jl:125
β”‚      [4] parse_uri_reference
β”‚        @ ~/.julia/packages/URIs/6ecRe/src/URIs.jl:123 [inlined]
β”‚      [5] URI
β”‚        @ ~/.julia/packages/URIs/6ecRe/src/URIs.jl:146 [inlined]
β”‚      [6] gethandler
β”‚        @ ~/.julia/packages/HTTP/z8l0i/src/Handlers.jl:391 [inlined]
β”‚      [7] (::HTTP.Handlers.Router{typeof(HTTP.Handlers.default404), typeof(HTTP.Handlers.default405), Nothing})(req::HTTP.Messages.Request)
β”‚        @ HTTP.Handlers ~/.julia/packages/HTTP/z8l0i/src/Handlers.jl:427
β”‚      [8] requestHandler(req::HTTP.Messages.Request)
β”‚        @ Main ./REPL[7]:2
β”‚      [9] (::HTTP.Handlers.var"#1#2"{typeof(requestHandler)})(stream::HTTP.Streams.Stream{HTTP.Messages.Request, HTTP.ConnectionPool.Connection{Sockets.TCPSocket}})
β”‚        @ HTTP.Handlers ~/.julia/packages/HTTP/z8l0i/src/Handlers.jl:58
β”‚     [10] #invokelatest#2
β”‚        @ ./essentials.jl:816 [inlined]
β”‚     [11] invokelatest
β”‚        @ ./essentials.jl:813 [inlined]
β”‚     [12] handle_connection(f::Function, c::HTTP.ConnectionPool.Connection{Sockets.TCPSocket}, listener::HTTP.Servers.Listener{Nothing, Sockets.TCPServer}, readtimeout::Int64, access_log::Nothing)
β”‚        @ HTTP.Servers ~/.julia/packages/HTTP/z8l0i/src/Servers.jl:447
β”‚     [13] macro expansion
β”‚        @ ~/.julia/packages/HTTP/z8l0i/src/Servers.jl:385 [inlined]
β”‚     [14] (::HTTP.Servers.var"#16#17"{HTTP.Handlers.var"#1#2"{typeof(requestHandler)}, HTTP.Servers.Listener{Nothing, Sockets.TCPServer}, Set{HTTP.ConnectionPool.Connection}, Int64, Nothing, Base.Semaphore, HTTP.ConnectionPool.Connection{Sockets.TCPSocket}})()
β”‚        @ HTTP.Servers ./task.jl:514
β”” @ HTTP.Servers ~/.julia/packages/HTTP/z8l0i/src/Servers.jl:461
  1. When I run the server with Julia -t 1,1 I get no error but note that
julia> fetch(Threads.@spawn :interactive Threads.threadpool())

so both tasks end up running on the same thread anyways, which affects latency.

So I’m obviously doing something wrong. Could someone help me get back on track?

Have you looked into ThreadPinning.jl ?

Yes, thanks! But it does not change the fact that when I start Julia with -t 1,1 there is only one thread available:

$ julia +beta -t 1,1 -q
julia> using ThreadPinning; threadinfo()

| **0**,1,2,3,4,5,6,7 | 

# = Julia thread, | = Socket seperator

Julia threads: 1
β”œ Occupied CPU-threads: 1
β”” Mapping (Thread => CPUID): 1 => 0,

EDIT: maybe not surprising, I think ThreadPinning.jl does not support interactive threads yet. It is only tested on Julia 1.6 and 1.7 in the CI pipeline.

1 Like

Created an issue for the thread behavior Interactive threads in Julia v1.9.0-beta do not give expected results Β· Issue #48580 Β· JuliaLang/julia Β· GitHub

And for the server failing

1 Like

Update: I opted for a dedicated worker process with Distributed. That seems to be a robust solution until the interactive thread issues get ironed out.