HTTP.jl async is slow compared to python+aiohttp

Just tested this and it runs as fast (if not slightly faster on average) than python does. I modified your example a little bit to save all the response bytes:

julia> function test(c::Config)
           host = Sockets.getaddrinfo("ipv4.download.thinkbroadband.com")
           data = asyncmap(1:c.ntask) do i
               map(1:c.nbatch) do j
                   s = TCPSocket()
                   Sockets.connect!(s, host, 80)
                   write(s, req)
                   bytes = read(s)
                   bytes
               end
           end
           reduce(vcat, data)
       end
test (generic function with 1 method)

julia> @time test(Config(100#=ntask=#, 1#=nbatch=#));
  7.527014 seconds (326.37 k allocations: 4.853 GiB, 0.65% compilation time)

I should add that there is a python library called uvloop which provides an asyncio compatible hook to the libuv event loop. I think I mentioned it earlier, but I tried using it again when running these benchmarks and got the same results, so this is probably another piece of evidence that the issues lie in HTTP.jl (and perhaps Downloads.jl too since I am finding it to be just as slow if not slower) and not libuv:

In [59]: %time cdatas = asyncio.run(get_items([i for i in range(100)]))
CPU times: user 2.55 s, sys: 1.34 s, total: 3.89 s
Wall time: 7.33 s

In [60]: import uvloop

In [61]: asyncio.set_event_loop_policy(uvloop.EventLoopPolicy())

In [62]: %time cdatas = asyncio.run(get_items([i for i in range(100)]))
CPU times: user 1.98 s, sys: 1.35 s, total: 3.34 s
Wall time: 7.35 s

For comparison with HTTP.jl:

julia> @time asyncmap(url->HTTP.request("GET", "http://ipv4.download.thinkbroadband.com/20MB.zip", status_exception=false).body, 1:100);
 13.986817 seconds (420.91 k allocations: 3.919 GiB, 0.19% compilation time)

I think the next step would be to try this with an example downloading different files over HTTPS such as the one I have been looking at so far.

4 Likes

Hey @agoodm – I am really enjoying this thread and wanted to commend your tenacity, curiosity, and patience in pursuing what issues you have seen with the Julia Community. As your first Julia Discourse post no less too! Welcome to the community and I am excited about the outcomes from this thorough investigation!

P.S. Are you a Destiny 2 fan? Because your profile picture looks like a Guardian! :smile:

11 Likes

It seems that the slowdown is caused by HTTP.jl 's unoptimized SSL implementation. Another person in this thread also quotes a similar post. The current solution is to replace MbedTLS with OpenSSL.jl.

Maybe you can do some benchmark with https://github.com/JuliaWeb/OpenSSL.jl ? The API to me looks much like the standard socket interface.

2 Likes

Thanks, I appreciate the warm welcome! The profile picture is the character “C” from Trails of Cold Steel, though I recommend avoiding a google search of the character name to anyone that may end up wanting to play the game and wants to avoid major spoilers.

Interestingly enough I just realized that the dataset I have been using in my testing so far can be accessed from either HTTP or HTTPS, so we can in fact directly see what the impact of using secure connections is. Here is the standard barebones HTTP.jl example again except this time I access the URLs through HTTP instead of HTTPS:

urls = map(i->"http://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1/analysed_sst/$i.1.0", 0:99);
julia> @time asyncmap(url->HTTP.request("GET", url, status_exception=false, connection_limit=25).body, urls)
  6.482805 seconds (292.75 k allocations: 4.881 GiB, 0.57% compilation time)

With python this essentially doesn’t change the result so I think we have now shown that at least for this workload HTTP.jl is now even with python when we don’t need to deal with the additional overhead that comes with HTTPS!

I tried doing just that, I am not too familiar with doing HTTP request at as low of a level as using sockets, especially with HTTPS/SSL so I am not sure if I did this right but here are the results:

julia> using OpenSSL

julia> using Sockets

julia> function get_cdata_open_ssl(req_t, ip, hostname)
           tcp = Sockets.TCPSocket()
           Sockets.connect!(tcp, ip, 443)
           ssl = SSLStream(tcp)
           OpenSSL.hostname!(ssl, hostname)
           OpenSSL.connect(ssl)
           write(ssl, req_t)
           data = read(tcp)
           close(ssl)
           data
       end
get_cdata_open_ssl (generic function with 2 methods)

julia> hostname = "mur-sst.s3.us-west-2.amazonaws.com"
"mur-sst.s3.us-west-2.amazonaws.com"

julia> ip = Sockets.getaddrinfo(hostname)
ip"52.218.225.17"

julia> req_t = "GET /zarr-v1/analysed_sst/<i>.1.1 HTTP/1.0\r\nHost: mur-sst.s3.us-west-2.amazonaws.com\r\nAccept: */*\r\nUser-Agent: HTTP.jl/1.8.5\r\nContent-Length: 0\r\nAccept-Encoding: gzip\r\n\r\n"
"GET /zarr-v1/analysed_sst/<i>.1.0 HTTP/1.1\r\nHost: mur-sst.s3.us-west-2.amazonaws.com\r\nAccept: */*\r\nUser-Agent: HTTP.jl/1.8.5\r\nContent-Length: 0\r\nAccept-Encoding: gzip\r\n\r\n"

julia> @time asyncmap(req->get_cdata_open_ssl(req, ip, hostname), [replace(req_t, "<i>" => i) for i in 0:99])
 15.624101 seconds (312.64 k allocations: 5.460 GiB, 0.41% compilation time)

So we are now twice as fast as base HTTP.jl over HTTPS but still 2x slower than Python. Note that changing the HTTP version from 1.1 to 1.0 makes a noticeable difference. I think in my last post with HTTP.jl it defaults to HTTP 1.1 (as does aiohttp) while the request headers you were using had HTTP 1.0, so I think that explains the differences seen there.

In summary one way or another I think we can say with near certainty now that the main inefficiencies of handling many HTTP requests in Julia compared to python (at least for this example) are primarily from how SSL/TLS are handled.

5 Likes

Eureka! I have finally solved this after checking the HTTP.jl documentation and determining that there is a way to configure it to use OpenSSL instead of MbedTLS. This can be done by passing in socket_type_tls=OpenSSL.SSLStream to the request. Once I did it now seems to be running as fast as python now over HTTPS:

urls = map(i->"https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1/analysed_sst/$i.1.0", 0:99);
julia> @time asyncmap(url->HTTP.request("GET", url, status_exception=false, connection_limit=25, socket_type_tls=OpenSSL.SSLStream).body, urls)
  7.531837 seconds (779.31 k allocations: 5.198 GiB, 0.49% compilation time)

A big thank you to everyone for helping me work this out!

22 Likes

Can OpenSSL be the default?

1 Like

I guess that’s another vote for Drop mbedTLS and migrate to OpenSSL · Issue #48799 · JuliaLang/julia · GitHub

5 Likes

Interesting, so after reading that I have confirmed that mbedTLS is also a dependency for Julia’s libcurl on Linux which would explain why I was also having similar async performance issues with Downloads.jl.

3 Likes

I’ve been hesitant to push OpenSSL.jl too hard because of this issue that was affecting heavy workloads (and as it turns out, workloads where some % of requests returned status errors). But now that those are resolved, I’m feeling much more confident on the OpenSSL implementation. I’m planning on doing some pretty heavy benchmarking across a bunch of dimensions soon, but I’ll try to put together a post on the results and maybe make a case for making OpenSSL the default. Also working on merging Summary of changes in this PR: by quinnj · Pull Request #1034 · JuliaWeb/HTTP.jl · GitHub, which will also positively impact HTTP client-side benchmarking performance.

13 Likes