Alright, I think I have managed to make some pretty important observations:
- HTTP.jl seems to ignore connection_limit after the first time
- Regardless of how high
connection_limit
gets set I only see around 20-30 concurrent connections max. - Increasing
connection_limit
didn’t improve the running time to process all 100 requests (if anything, I am seeing slightly worse results) but made more of them finish at the same time (eg more tasks starting concurrently but taking proportionally longer to finish). Example:
julia> asyncmap(urls) do url
@time HTTP.request("GET", url, status_exception=false, connection_limit=100).body;
end
0.948380 seconds (33.16 k allocations: 139.280 MiB)
2.961647 seconds (103.75 k allocations: 457.984 MiB)
4.462301 seconds (160.76 k allocations: 725.730 MiB)
4.818988 seconds (176.62 k allocations: 804.385 MiB)
8.271751 seconds (292.88 k allocations: 1.305 GiB)
9.714883 seconds (345.51 k allocations: 1.544 GiB)
11.498342 seconds (410.16 k allocations: 1.869 GiB)
14.208980 seconds (509.78 k allocations: 2.293 GiB)
14.333130 seconds (514.10 k allocations: 2.321 GiB)
14.497057 seconds (520.65 k allocations: 2.350 GiB)
21.893111 seconds (784.32 k allocations: 3.599 GiB)
24.284558 seconds (870.61 k allocations: 3.985 GiB)
24.352982 seconds (872.85 k allocations: 4.010 GiB)
25.268930 seconds (913.36 k allocations: 4.128 GiB)
25.305595 seconds (914.71 k allocations: 4.153 GiB)
25.339386 seconds (915.59 k allocations: 4.177 GiB)
33.214921 seconds (1.19 M allocations: 5.444 GiB)
33.968569 seconds (1.22 M allocations: 5.570 GiB)
34.240627 seconds (1.24 M allocations: 5.621 GiB)
34.240651 seconds (1.24 M allocations: 5.621 GiB)
34.685784 seconds (1.25 M allocations: 5.706 GiB)
34.240617 seconds (1.24 M allocations: 5.621 GiB)
34.240648 seconds (1.24 M allocations: 5.621 GiB)
34.240607 seconds (1.24 M allocations: 5.621 GiB)
34.240508 seconds (1.24 M allocations: 5.621 GiB)
34.240496 seconds (1.24 M allocations: 5.621 GiB)
34.240484 seconds (1.24 M allocations: 5.621 GiB)
34.240468 seconds (1.24 M allocations: 5.621 GiB)
34.240479 seconds (1.24 M allocations: 5.621 GiB)
34.240379 seconds (1.24 M allocations: 5.621 GiB)
34.240421 seconds (1.24 M allocations: 5.621 GiB)
34.240342 seconds (1.24 M allocations: 5.621 GiB)
34.240375 seconds (1.24 M allocations: 5.621 GiB)
34.240338 seconds (1.24 M allocations: 5.621 GiB)
34.240291 seconds (1.24 M allocations: 5.621 GiB)
34.240328 seconds (1.24 M allocations: 5.621 GiB)
34.240290 seconds (1.24 M allocations: 5.621 GiB)
34.240277 seconds (1.24 M allocations: 5.621 GiB)
34.240270 seconds (1.24 M allocations: 5.621 GiB)
34.240261 seconds (1.24 M allocations: 5.621 GiB)
34.240231 seconds (1.24 M allocations: 5.621 GiB)
34.240199 seconds (1.24 M allocations: 5.621 GiB)
34.240186 seconds (1.24 M allocations: 5.621 GiB)
34.240178 seconds (1.24 M allocations: 5.621 GiB)
34.240127 seconds (1.24 M allocations: 5.621 GiB)
34.240153 seconds (1.24 M allocations: 5.621 GiB)
34.240123 seconds (1.24 M allocations: 5.621 GiB)
34.240095 seconds (1.24 M allocations: 5.621 GiB)
34.240074 seconds (1.24 M allocations: 5.621 GiB)
34.240042 seconds (1.24 M allocations: 5.621 GiB)
34.240032 seconds (1.24 M allocations: 5.621 GiB)
34.239964 seconds (1.24 M allocations: 5.621 GiB)
34.239936 seconds (1.24 M allocations: 5.621 GiB)
34.239922 seconds (1.24 M allocations: 5.621 GiB)
34.239971 seconds (1.24 M allocations: 5.621 GiB)
34.239922 seconds (1.24 M allocations: 5.621 GiB)
34.240345 seconds (1.24 M allocations: 5.621 GiB)
34.239917 seconds (1.24 M allocations: 5.621 GiB)
34.239910 seconds (1.24 M allocations: 5.621 GiB)
34.239897 seconds (1.24 M allocations: 5.621 GiB)
34.239884 seconds (1.24 M allocations: 5.621 GiB)
34.239874 seconds (1.24 M allocations: 5.621 GiB)
34.239862 seconds (1.24 M allocations: 5.621 GiB)
34.239845 seconds (1.24 M allocations: 5.621 GiB)
34.239830 seconds (1.24 M allocations: 5.621 GiB)
34.239819 seconds (1.24 M allocations: 5.621 GiB)
34.239808 seconds (1.24 M allocations: 5.621 GiB)
34.239777 seconds (1.24 M allocations: 5.621 GiB)
34.239765 seconds (1.24 M allocations: 5.621 GiB)
34.239754 seconds (1.24 M allocations: 5.621 GiB)
34.239742 seconds (1.24 M allocations: 5.621 GiB)
34.239729 seconds (1.24 M allocations: 5.621 GiB)
34.239557 seconds (1.24 M allocations: 5.621 GiB)
34.239586 seconds (1.24 M allocations: 5.621 GiB)
34.239635 seconds (1.24 M allocations: 5.621 GiB)
34.239624 seconds (1.24 M allocations: 5.621 GiB)
34.239571 seconds (1.24 M allocations: 5.621 GiB)
34.239558 seconds (1.24 M allocations: 5.621 GiB)
34.239546 seconds (1.24 M allocations: 5.621 GiB)
34.239534 seconds (1.24 M allocations: 5.621 GiB)
34.239525 seconds (1.24 M allocations: 5.621 GiB)
34.239516 seconds (1.24 M allocations: 5.621 GiB)
34.239504 seconds (1.24 M allocations: 5.621 GiB)
34.239495 seconds (1.24 M allocations: 5.621 GiB)
34.239483 seconds (1.24 M allocations: 5.621 GiB)
34.239472 seconds (1.24 M allocations: 5.621 GiB)
34.239459 seconds (1.24 M allocations: 5.621 GiB)
34.239450 seconds (1.24 M allocations: 5.621 GiB)
34.239442 seconds (1.24 M allocations: 5.621 GiB)
34.239424 seconds (1.24 M allocations: 5.621 GiB)
34.239410 seconds (1.24 M allocations: 5.621 GiB)
34.239390 seconds (1.24 M allocations: 5.621 GiB)
34.239376 seconds (1.24 M allocations: 5.621 GiB)
34.239362 seconds (1.24 M allocations: 5.621 GiB)
34.239335 seconds (1.24 M allocations: 5.621 GiB)
34.239305 seconds (1.24 M allocations: 5.621 GiB)
34.239266 seconds (1.24 M allocations: 5.621 GiB)
34.417358 seconds (1.24 M allocations: 5.648 GiB)
34.957085 seconds (1.27 M allocations: 5.730 GiB)
39.520428 seconds (1.30 M allocations: 5.754 GiB)
This all suggests that HTTP.jl isn’t behaving as you would expect based on the connection_limit
that gets set.
On the other hand, I have confirmed my python code is indeed opening 100 connections by default, so I decided to see if reducing the number of allowed open connections would make a difference. Turns out it does not! (And I verified this with lsof of course).
Modified code to change connection limit:
In [16]: keys = [f'analysed_sst/{i}.1.0' for i in range(100)]
...: In [25]: async def get_items(keys, limit=100):
...: connector = aiohttp.TCPConnector(limit=limit)
...: ...: async with aiohttp.ClientSession(connector=connector) as session:
...: ...: tasks = []
...: ...: for ckey in keys:
...: ...: url = f"https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1/{ckey}"
...: ...: tasks.append(asyncio.create_task(get_cdata(session, url)))
...: ...: cdatas = await asyncio.gather(*tasks)
...: ...: return cdatas
...:
In [17]: async def get_cdata(session, url):
...: ...: async with session.get(url) as resp:
...: ...: cdata = await resp.read()
...: ...: return cdata
...: ...:
...:
In [18]: %time cdatas = asyncio.run(get_items(keys, limit=50))
CPU times: user 3.41 s, sys: 2.25 s, total: 5.65 s
Wall time: 6.29 s
In [19]: %time cdatas = asyncio.run(get_items(keys, limit=25))
CPU times: user 3.12 s, sys: 1.57 s, total: 4.69 s
Wall time: 6.26 s
In [20]: %time cdatas = asyncio.run(get_items(keys, limit=10))
CPU times: user 3.3 s, sys: 1.18 s, total: 4.47 s
Wall time: 5.8 s
In [21]: %time cdatas = asyncio.run(get_items(keys, limit=10))
CPU times: user 3.37 s, sys: 1.22 s, total: 4.59 s
Wall time: 5.43 s