HTTP.jl async is slow compared to python+aiohttp

Introduction

I went through several steps of optimization of the Julia code. Basically we need to make sure we are running compiled code and that we have optimized memory usage as much as possible.

I used bmon to check my network usage. The first thing I noticed below was that the peak receiving bandwidth was higher with asyncio. This made me wonder if there connection limit was holding HTTP.jl back.

The effective wall times are now comparable for me after optimization. Your situation may require further tuning with a higher bandwidth connection.

Python / asyncio

Here’s what I see with Python asyncio.run:

In [11]: %time cdatas = asyncio.run(get_items(keys))
CPU times: user 18.7 s, sys: 3.81 s, total: 22.5 s
Wall time: 37.7 s

image

Initial Julia code

Here is what I see with your Julia code:

julia> const urls = map(i->"https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1/analysed_sst/$i.1.0", 0:99);

julia> @time asyncmap(url->HTTP.request("GET", url, status_exception=false).body, urls);
 73.768142 seconds (12.42 M allocations: 5.409 GiB, 0.81% gc time, 15.31% compilation time)

image

Compiled function

Putting this into a function and making sure it gets compiled, I then get the following results via Julia:

julia> function f()
           asyncmap(url->HTTP.request("GET", url, status_exception=false).body, urls);
       end

julia> @time f()
 51.969691 seconds (3.01 M allocations: 4.816 GiB, 0.42% gc time)

image

Julia optimization with preallocated buffers

Optimizations:

  1. Use a function to compile the code
  2. Make all the globals const
  3. Increase the connection_limit
  4. Preallocate the buffers
  5. Using Threads.@spawn to allow tasks to use a threads.
julia> const urls = map(i->"https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1/analysed_sst/$i.1.0", 0:99);

julia> const buffers = [IOBuffer(; sizehint = 64*1024*1024, maxsize=64*1024*1024) for x in 1:100]

julia> function f()
           seekstart.(buffers)
           @sync map(urls, buffers) do url, buffer
               Threads.@spawn HTTP.request("GET", url, status_exception=false, connection_limit=25, response_stream=buffer)
           end
       end

julia> @time f()
 35.779649 seconds (5.80 M allocations: 176.242 MiB, 0.21% compilation time)

julia> seekstart.(buffers); read.(buffers)
100-element Vector{Vector{UInt8}}:
 [0x02, 0x01, 0x21, 0x02, 0x60, 0x38, 0xdc, 0x03, 0x00, 0x00  …  0x0f, 0x02, 0x00, 0x08, 0x50, 0xb5, 0xb5, 0xb5, 0xb4, 0xb4]
...

image

Discussion

Part of the optimization above is general to Julia. Make your globals const or at least binding them to a type assists in precompilation. Creating a function is also helpful for precompilation. Managing memory is also important and I suspect that this accounts for some difference.

Above we preallocated a lot of memory partially based on prior knowledge. This prior knowledge could be obtained via a single HTTP request to the following URL and then parsing the returned XML:

https://mur-sst.s3.us-west-2.amazonaws.com/?prefix=zarr-v1/analysed_sst

This uses the Amazon S3 ListObjectsV2 API:

We may not have to preallocate all the memory. We just need enough to handle the number of concurrent connections. We could then copy the memory out and then reuse the IOBuffers.

5 Likes