Introduction
I went through several steps of optimization of the Julia code. Basically we need to make sure we are running compiled code and that we have optimized memory usage as much as possible.
I used bmon
to check my network usage. The first thing I noticed below was that the peak receiving bandwidth was higher with asyncio. This made me wonder if there connection limit was holding HTTP.jl back.
The effective wall times are now comparable for me after optimization. Your situation may require further tuning with a higher bandwidth connection.
Python / asyncio
Here’s what I see with Python asyncio.run
:
In [11]: %time cdatas = asyncio.run(get_items(keys))
CPU times: user 18.7 s, sys: 3.81 s, total: 22.5 s
Wall time: 37.7 s
Initial Julia code
Here is what I see with your Julia code:
julia> const urls = map(i->"https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1/analysed_sst/$i.1.0", 0:99);
julia> @time asyncmap(url->HTTP.request("GET", url, status_exception=false).body, urls);
73.768142 seconds (12.42 M allocations: 5.409 GiB, 0.81% gc time, 15.31% compilation time)
Compiled function
Putting this into a function and making sure it gets compiled, I then get the following results via Julia:
julia> function f()
asyncmap(url->HTTP.request("GET", url, status_exception=false).body, urls);
end
julia> @time f()
51.969691 seconds (3.01 M allocations: 4.816 GiB, 0.42% gc time)
Julia optimization with preallocated buffers
Optimizations:
- Use a function to compile the code
- Make all the globals
const
- Increase the
connection_limit
- Preallocate the buffers
- Using
Threads.@spawn
to allow tasks to use a threads.
julia> const urls = map(i->"https://mur-sst.s3.us-west-2.amazonaws.com/zarr-v1/analysed_sst/$i.1.0", 0:99);
julia> const buffers = [IOBuffer(; sizehint = 64*1024*1024, maxsize=64*1024*1024) for x in 1:100]
julia> function f()
seekstart.(buffers)
@sync map(urls, buffers) do url, buffer
Threads.@spawn HTTP.request("GET", url, status_exception=false, connection_limit=25, response_stream=buffer)
end
end
julia> @time f()
35.779649 seconds (5.80 M allocations: 176.242 MiB, 0.21% compilation time)
julia> seekstart.(buffers); read.(buffers)
100-element Vector{Vector{UInt8}}:
[0x02, 0x01, 0x21, 0x02, 0x60, 0x38, 0xdc, 0x03, 0x00, 0x00 … 0x0f, 0x02, 0x00, 0x08, 0x50, 0xb5, 0xb5, 0xb5, 0xb4, 0xb4]
...
Discussion
Part of the optimization above is general to Julia. Make your globals const or at least binding them to a type assists in precompilation. Creating a function is also helpful for precompilation. Managing memory is also important and I suspect that this accounts for some difference.
Above we preallocated a lot of memory partially based on prior knowledge. This prior knowledge could be obtained via a single HTTP request to the following URL and then parsing the returned XML:
https://mur-sst.s3.us-west-2.amazonaws.com/?prefix=zarr-v1/analysed_sst
This uses the Amazon S3 ListObjectsV2 API:
We may not have to preallocate all the memory. We just need enough to handle the number of concurrent connections. We could then copy the memory out and then reuse the IOBuffer
s.