HTTP.jl async is slow compared to python+aiohttp

Alright, I think I have managed to make some pretty important observations:

  1. HTTP.jl seems to ignore connection_limit after the first time
  2. Regardless of how high connection_limit gets set I only see around 20-30 concurrent connections max.
  3. Increasing connection_limit didn’t improve the running time to process all 100 requests (if anything, I am seeing slightly worse results) but made more of them finish at the same time (eg more tasks starting concurrently but taking proportionally longer to finish). Example:
julia> asyncmap(urls) do url
           @time HTTP.request("GET", url, status_exception=false, connection_limit=100).body;
       end
  0.948380 seconds (33.16 k allocations: 139.280 MiB)
  2.961647 seconds (103.75 k allocations: 457.984 MiB)
  4.462301 seconds (160.76 k allocations: 725.730 MiB)
  4.818988 seconds (176.62 k allocations: 804.385 MiB)
  8.271751 seconds (292.88 k allocations: 1.305 GiB)
  9.714883 seconds (345.51 k allocations: 1.544 GiB)
 11.498342 seconds (410.16 k allocations: 1.869 GiB)
 14.208980 seconds (509.78 k allocations: 2.293 GiB)
 14.333130 seconds (514.10 k allocations: 2.321 GiB)
 14.497057 seconds (520.65 k allocations: 2.350 GiB)
 21.893111 seconds (784.32 k allocations: 3.599 GiB)
 24.284558 seconds (870.61 k allocations: 3.985 GiB)
 24.352982 seconds (872.85 k allocations: 4.010 GiB)
 25.268930 seconds (913.36 k allocations: 4.128 GiB)
 25.305595 seconds (914.71 k allocations: 4.153 GiB)
 25.339386 seconds (915.59 k allocations: 4.177 GiB)
 33.214921 seconds (1.19 M allocations: 5.444 GiB)
 33.968569 seconds (1.22 M allocations: 5.570 GiB)
 34.240627 seconds (1.24 M allocations: 5.621 GiB)
 34.240651 seconds (1.24 M allocations: 5.621 GiB)
 34.685784 seconds (1.25 M allocations: 5.706 GiB)
 34.240617 seconds (1.24 M allocations: 5.621 GiB)
 34.240648 seconds (1.24 M allocations: 5.621 GiB)
 34.240607 seconds (1.24 M allocations: 5.621 GiB)
 34.240508 seconds (1.24 M allocations: 5.621 GiB)
 34.240496 seconds (1.24 M allocations: 5.621 GiB)
 34.240484 seconds (1.24 M allocations: 5.621 GiB)
 34.240468 seconds (1.24 M allocations: 5.621 GiB)
 34.240479 seconds (1.24 M allocations: 5.621 GiB)
 34.240379 seconds (1.24 M allocations: 5.621 GiB)
 34.240421 seconds (1.24 M allocations: 5.621 GiB)
 34.240342 seconds (1.24 M allocations: 5.621 GiB)
 34.240375 seconds (1.24 M allocations: 5.621 GiB)
 34.240338 seconds (1.24 M allocations: 5.621 GiB)
 34.240291 seconds (1.24 M allocations: 5.621 GiB)
 34.240328 seconds (1.24 M allocations: 5.621 GiB)
 34.240290 seconds (1.24 M allocations: 5.621 GiB)
 34.240277 seconds (1.24 M allocations: 5.621 GiB)
 34.240270 seconds (1.24 M allocations: 5.621 GiB)
 34.240261 seconds (1.24 M allocations: 5.621 GiB)
 34.240231 seconds (1.24 M allocations: 5.621 GiB)
 34.240199 seconds (1.24 M allocations: 5.621 GiB)
 34.240186 seconds (1.24 M allocations: 5.621 GiB)
 34.240178 seconds (1.24 M allocations: 5.621 GiB)
 34.240127 seconds (1.24 M allocations: 5.621 GiB)
 34.240153 seconds (1.24 M allocations: 5.621 GiB)
 34.240123 seconds (1.24 M allocations: 5.621 GiB)
 34.240095 seconds (1.24 M allocations: 5.621 GiB)
 34.240074 seconds (1.24 M allocations: 5.621 GiB)
 34.240042 seconds (1.24 M allocations: 5.621 GiB)
 34.240032 seconds (1.24 M allocations: 5.621 GiB)
 34.239964 seconds (1.24 M allocations: 5.621 GiB)
 34.239936 seconds (1.24 M allocations: 5.621 GiB)
 34.239922 seconds (1.24 M allocations: 5.621 GiB)
 34.239971 seconds (1.24 M allocations: 5.621 GiB)
 34.239922 seconds (1.24 M allocations: 5.621 GiB)
 34.240345 seconds (1.24 M allocations: 5.621 GiB)
 34.239917 seconds (1.24 M allocations: 5.621 GiB)
 34.239910 seconds (1.24 M allocations: 5.621 GiB)
 34.239897 seconds (1.24 M allocations: 5.621 GiB)
 34.239884 seconds (1.24 M allocations: 5.621 GiB)
 34.239874 seconds (1.24 M allocations: 5.621 GiB)
 34.239862 seconds (1.24 M allocations: 5.621 GiB)
 34.239845 seconds (1.24 M allocations: 5.621 GiB)
 34.239830 seconds (1.24 M allocations: 5.621 GiB)
 34.239819 seconds (1.24 M allocations: 5.621 GiB)
 34.239808 seconds (1.24 M allocations: 5.621 GiB)
 34.239777 seconds (1.24 M allocations: 5.621 GiB)
 34.239765 seconds (1.24 M allocations: 5.621 GiB)
 34.239754 seconds (1.24 M allocations: 5.621 GiB)
 34.239742 seconds (1.24 M allocations: 5.621 GiB)
 34.239729 seconds (1.24 M allocations: 5.621 GiB)
 34.239557 seconds (1.24 M allocations: 5.621 GiB)
 34.239586 seconds (1.24 M allocations: 5.621 GiB)
 34.239635 seconds (1.24 M allocations: 5.621 GiB)
 34.239624 seconds (1.24 M allocations: 5.621 GiB)
 34.239571 seconds (1.24 M allocations: 5.621 GiB)
 34.239558 seconds (1.24 M allocations: 5.621 GiB)
 34.239546 seconds (1.24 M allocations: 5.621 GiB)
 34.239534 seconds (1.24 M allocations: 5.621 GiB)
 34.239525 seconds (1.24 M allocations: 5.621 GiB)
 34.239516 seconds (1.24 M allocations: 5.621 GiB)
 34.239504 seconds (1.24 M allocations: 5.621 GiB)
 34.239495 seconds (1.24 M allocations: 5.621 GiB)
 34.239483 seconds (1.24 M allocations: 5.621 GiB)
 34.239472 seconds (1.24 M allocations: 5.621 GiB)
 34.239459 seconds (1.24 M allocations: 5.621 GiB)
 34.239450 seconds (1.24 M allocations: 5.621 GiB)
 34.239442 seconds (1.24 M allocations: 5.621 GiB)
 34.239424 seconds (1.24 M allocations: 5.621 GiB)
 34.239410 seconds (1.24 M allocations: 5.621 GiB)
 34.239390 seconds (1.24 M allocations: 5.621 GiB)
 34.239376 seconds (1.24 M allocations: 5.621 GiB)
 34.239362 seconds (1.24 M allocations: 5.621 GiB)
 34.239335 seconds (1.24 M allocations: 5.621 GiB)
 34.239305 seconds (1.24 M allocations: 5.621 GiB)
 34.239266 seconds (1.24 M allocations: 5.621 GiB)
 34.417358 seconds (1.24 M allocations: 5.648 GiB)
 34.957085 seconds (1.27 M allocations: 5.730 GiB)
 39.520428 seconds (1.30 M allocations: 5.754 GiB)

This all suggests that HTTP.jl isn’t behaving as you would expect based on the connection_limit that gets set.

On the other hand, I have confirmed my python code is indeed opening 100 connections by default, so I decided to see if reducing the number of allowed open connections would make a difference. Turns out it does not! (And I verified this with lsof of course).

Modified code to change connection limit:

In [16]: keys = [f'analysed_sst/{i}.1.0' for i in range(100)]
    ...: In [25]: async def get_items(keys, limit=100):
    ...:     connector = aiohttp.TCPConnector(limit=limit)
    ...:     ...:     async with aiohttp.ClientSession(connector=connector) as session:
    ...:     ...:         tasks = []
    ...:     ...:         for ckey in keys:
    ...:     ...:             url = f"https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1/{ckey}"
    ...:     ...:             tasks.append(asyncio.create_task(get_cdata(session, url)))
    ...:     ...:         cdatas = await asyncio.gather(*tasks)
    ...:     ...:         return cdatas
    ...: 

In [17]: async def get_cdata(session, url):
    ...:     ...:     async with session.get(url) as resp:
    ...:     ...:         cdata = await resp.read()
    ...:     ...:         return cdata
    ...:     ...:
    ...: 

In [18]: %time cdatas = asyncio.run(get_items(keys, limit=50))
CPU times: user 3.41 s, sys: 2.25 s, total: 5.65 s
Wall time: 6.29 s

In [19]: %time cdatas = asyncio.run(get_items(keys, limit=25))
CPU times: user 3.12 s, sys: 1.57 s, total: 4.69 s
Wall time: 6.26 s

In [20]: %time cdatas = asyncio.run(get_items(keys, limit=10))
CPU times: user 3.3 s, sys: 1.18 s, total: 4.47 s
Wall time: 5.8 s

In [21]: %time cdatas = asyncio.run(get_items(keys, limit=10))
CPU times: user 3.37 s, sys: 1.22 s, total: 4.59 s
Wall time: 5.43 s
5 Likes