I’ve been trying to create a simple Prometheus client for Julia. For this, I need to run a small web server inside my application, which does all sorts of computations. The problem is that the web server provided by HTTP.jl is concurrent, but not parallel, i.e., no web requests (Prometheus scrapes in this case) are served while there is an active computation running in the foreground.
julia> using HTTP
julia> HTTP.serve!(Returns(HTTP.Response("ok")));
[ Info: Listening on: 127.0.0.1:8081, thread id: 1
# curl http://localhost:80801 -> ok
julia> fib(n) = return n <= 2 ? 1 : fib(n - 1) + fib(n - 2)
julia> fib(47)
# curl http://localhost:80801 -> does not respond until result is computed
2971215073
# ok returned now
Fine, that’s expected. So my naive approach was to start the server on a separate thread:
$ julia --threads=2
julia> using HTTP
julia> Threads.@spawn HTTP.serve!(Returns(HTTP.Response("ok")))
[ Info: Listening on: 127.0.0.1:8081, thread id: 2
# curl http://localhost:80801 -> ok
julia> fib(n) = return n <= 2 ? 1 : fib(n - 1) + fib(n - 2)
julia> fib(47)
# curl http://localhost:80801 -> still does not respond until result is computed
2971215073
# ok returned now
But even though the server is clearly started on the 2nd thread, it still doesn’t respond until there is a computation running on the main thread.
I also tried using Distributed, but that makes the application so complicated and messy that it’s hardly worth the result.
Any ideas or hints on how to get around solving this? It’s hard to believe that I’m the only one with this problem…
@sync begin
@spawn begin
println("Starting the server on thread $(threadid())")
server = HTTP.serve!(Returns(HTTP.Response("ok")))
sleep(5)
println("Closing the server...")
HTTP.Servers.forceclose(server)
end
@spawn begin
println("task on thread $(threadid())")
sleep(1)
println("done sleeping (1)")
success(`curl http://localhost:80801`)
println("curl ok")
sleep(2)
println("done sleeping (2)")
end
end
it gives me
task on thread 1
Starting the server on thread 2
[ Info: Listening on: 127.0.0.1:8081, thread id: 2
done sleeping (1)
curl ok
done sleeping (2)
Closing the server...
[ Info: Server on 127.0.0.1:8081 closing
Task (done) @0x0000000170f255f0
(you won’t necessarily see the same thread id for the listening and the “starting server”, see comment below)
It’s been a year or more since tasks could migrate to other threads. You shouldn’t rely on a task to stay one a thread unless you spawn it on a static thread. I think there’s a way to do that but not sure why you would want to.
I don’t see how your example demonstrates the server actually working on one thread while another thread is busy computing stuff. The HTTP server works fine from a second thread, except that it doesn’t respond while the main (interactive) thread is doing any computations.
I probably misunderstood your question but with 3 threads and adding one task that does busy work you get similar stuff:
using Base.Threads:@spawn
using HTTP
@sync begin
@spawn begin
server = HTTP.serve!(Returns(HTTP.Response("ok")))
sleep(5)
println("Closing the server...")
HTTP.Servers.forceclose(server)
end
@spawn begin
sleep(1)
println("done sleeping (1)")
success(`curl http://localhost:80801`)
println("curl ok")
sleep(2)
println("done sleeping (2)")
end
@spawn begin
println("start doing work")
for _ in 1:100_000_000
rand(10)
end
println("finished doing computations")
end
end
[ Info: Listening on: 127.0.0.1:8081, thread id: 2
start doing work
done sleeping (1)
curl ok
done sleeping (2)
Closing the server...
[ Info: Server on 127.0.0.1:8081 closing
finished doing computations
Task (done) @0x000000010c93c1a0
if that’s not your point, I guess I’ll let someone else answer
After running some experiments and also going over the HTTP.jl documentation (especially related to how a non-blocking server is spawned), I concluded this is a Julia issue, not related at all to HTTP.jl package.
Consider the following experiment:
using Base.Threads
using Dates
interactive_channell = Channel{String}(Inf)
function run_on_interactive()
while true
msg = take!(interactive_channell)
t = now()
id = threadid()
@spawn @info "$t - fast work on interactive - received: $msg on thread $id"
end
end
function feeder()
counter = 0
while true
t = now()
id = threadid()
@spawn @info "$t - feeding counter: $counter on thread $id"
put!(interactive_channell, "hello $counter")
counter += 1
sleep(1)
end
end
@spawn :interactive run_on_interactive()
@spawn feeder()
fib(n) = return n <= 2 ? 1 : fib(n - 1) + fib(n - 2)
function longrunning()
while true
for i in 1:47
@info "fib $i is: " fib(i)
end
end
end
# not spinning the :default thread
# all works flowlessly
t = @spawn longrunning()
wait(t)
# comment the above and uncomment the below
# now we get into problems
#longrunning()
At first sight, you might ask why I used @spawn on @info statements: well, because I wanted to make sure that they are not blocking and the counter progresses undisturbed. Because in the spinning of the main thread scenario, we might argue that the @info/print/println is somehow delayed because they are related to the main thread. However, I wanted to ensure that is not the case (and it is not - the experiment runs the same regardless of using @spawn on the print statements.
I ran the experiment with multiple combinations of :interactive/regular threads - the same result.
Conclusions:
Spinning the main thread seems to block/delay communication between the other threads (that are doing naive/low computation). You can test this by running the calling longrunning() on the main thread (see comments in the code snippet).
All works well if the longrunning() is executed as a task and main thread is free.
Before commenting more on how upsetting this can get in more intensive async programs (where the main thread needs to perform computations from time to time or even frequently), I want to make sure I did nothing wrong (I really hope I made a mistake and/or I am missing something).
I don’t know for sure, but my guess is what’s going on here is that @spawn creates a task to be scheduled on any thread but then at some point the scheduler needs to run and it’s running in the main thread (or something like that) so the task never hits the scheduler until there’s a pause.
What happens if you call yield() in the main thread immediately after @spawn and then continue doing busy work in the main thread? I guess you’d get a different result?
EDIT: another issue might be contention over output resources. If the thing you’re trying to spawn is supposed to spew some stuff to stdout it might just block waiting to get a lock on the output or something similar.
I already checked that - in different ways. For example - I can put the main to sleep for a few seconds until it is clear that the tasks are already running - however when starting the long-running computation on the main thread - the communication between the tasks is halted/delayed.
I also tried initiating the channel on a different thread (not the main one). The same result: the communication with the channel of tasks running on different threads than the main seems dependent on the free main thread.
Please tell me that I made a mistake somewhere. This is crazy.
This has significant implications for an independent contract I am working on (I convinced the client to go with Julia) - and I hoped to be wrong and tried all kinds of stuff to invalidate my conclusions. At this point, I hope something is wrong with my little experiment, and someone will point out my ignorance, stupidity, or lack of attention.
Yikes! I agree this is weird. Have you seen the behavior in other versions of Julia? I certainly have written some code in the past that appeared to have a background thread doing work successfully.
Have a look at ThreadPools.jl too
I’ll follow the other “thread” so to speak rather than clog up this one.
1.10 was smart enough to skip the long-running computation (I think it figured it out at compile time) - so I had to and a rand() - so nice thing, but still the same issue with the main thread work affecting the running tasks.
Let’s continue on the other thread on the other thread.
As you can see - after more digging, I arrived at a different conclusion (please check the other topic).
The problem seems to be the sleep function that I used in my tests - the documentation is as follows:
Block the current task for a specified number of seconds. The minimum sleep time is 1 millisecond or input of 0.001 .
The reality is “block the current task the specified number of seconds plus the time spent by the main thread doing work while the current task is running.”
Now let’s dive into HTTP.jl source and find the sleep.
An easier to run MWE might be the following one (julia -t 2,1 script.jl). Regardless of how much you increase the -t value, HTTP.jl responsivity is impacted by work done on the main thread:
using Base.Threads, Dates
using HTTP
function mainspin(s, id; steps=1)
@info "mainspin $id executing on thread $(threadid()) "
ms = Millisecond(s * 1000)
r = 0.0
for _ in 1:steps
t = now()
@time while (now() - t) < ms
r += rand()
end
sleep(1)
end
@info "mainspin $id finished "
end
HTTP.serve!(Returns(HTTP.Response("ok")))
# steps is not relevant, it only buys you time to
# play around with the server
# not counting the sleep in `mainspin`, here we will have about 100 seconds
# and 200 seconds if we do mainspin(10, 1, steps=20)
mainspin(5, 1, steps=20)
# now go in the terminal and play around
# curl 127.0.0.1:8081
Relevant additions:
I opened an issue on the Julia repo, which I think is related to the issue above (although HTTP.jl is not directly using sleep function in the parts of code being reached/executed while the above MWE is executed.
@quinnj, you might be interested in watching the issue closely and especially investigating if there is a way to avoid libuv-related dependencies in non-blocking HTTP.jl parts (I think that is unlikely, but who knows). Please check the above MWE to see how non-blocking becomes blocking if you do some work on the main thread (or even if you call Libc.systemsleep on the main thread).
Wow. Just wow. @algunion, you have put an insane amount of work into this, especially considering that it was a weekend. It took me almost an hour just to read everything that was written in relation to this topic. I can’t appreciate your efforts enough! Huge thanks! I sure hope something good will come out of this. Let me know if I can help with anything.