Hello Julia Community,
I have been working on a new gRPC client with an emphasis on production grade performance and reliability. The client is already thread / async safe and uses the fast and up to date 1.0 version of ProtoBuf.jl. There is zero extra memory copying or buffering between the client and libCURL and optimizations to reduce overhead of having many small streams over multiplexed HTTP/2.
Repo: GitHub - csvance/gRPCClient2.jl: Production Grade gRPC Client for Julia
Docs: gRPCClient2.jl ยท gRPCClient2
The name of the package is just a placeholder while it is under rapid development. The client borrows some code from gRPCClient.jl and Downloads.jl, so thanks to the maintainers/contributors of those packages for helping bootstrap this effort.
Looking for collaborators in general but right now I need the following:
- general usage testing / feedback on interfaces and API
- more test coverage for streaming / test against more gRPC servers than just Python
Of course I am working through most these myself but I would appreciate any help if you are interested in having a production grade gRPC client in Julia.
The latency / overhead / resource usage is currently quite minimal. Some benchmarks bellow (API not final):
// Benchmark a unary RPC with small protobufs to demonstrate overhead per request
// subset of grpc_predict_v2.proto for testing
syntax = "proto3";
package inference;
message ServerReadyRequest {}
message ServerReadyResponse
{
// True if the inference server is ready, false if not ready.
bool ready = 1;
}
service GRPCInferenceService
{
// The ServerReady API indicates if the server is ready for inferencing.
rpc ServerReady(ServerReadyRequest) returns (ServerReadyResponse) {}
}
using ProtoBuf
using BenchmarkTools
using gRPCClient2
using Base.Threads
include("grpc_predict_v2_pb.jl")
const grpc = gRPCCURL()
function bench_ready(n)
@sync begin
requests = Vector{gRPCRequest}()
for i in 1:n
request = ServerReadyRequest()
# once we generate bindings from the service definition this will be much cleaner
req = grpc_unary_async_request(grpc, "grpc://rpctest.local:8001/inference.GRPCInferenceService/ServerReady", request)
push!(requests, req)
end
for req in requests
response = grpc_unary_async_await(grpc, req, ServerReadyResponse)
end
end
end
# Sync usage (must wait for response before sending next request)
@benchmark bench_ready(1)
BenchmarkTools.Trial: 6821 samples with 1 evaluation per sample.
Range (min โฆ max): 370.152 ฮผs โฆ 6.193 ms โ GC (min โฆ max): 0.00% โฆ 0.00%
Time (median): 520.608 ฮผs โ GC (median): 0.00%
Time (mean ยฑ ฯ): 722.812 ฮผs ยฑ 671.093 ฮผs โ GC (mean ยฑ ฯ): 0.00% ยฑ 0.00%
โโโโโ
โโโโโ โ
โโโโโโโโโโโโโโ
โโโโ
โ
โโโโโ
โโโโ
โโโ
โ
โ
โโ
โ
โโ
โ
โ
โ
โโ
โโโโโโโโโโโโโ
โโโโโ โ
370 ฮผs Histogram: log(frequency) by time 3.9 ms <
Memory estimate: 5.36 KiB, allocs estimate: 109.
# Async usage (send all requests as fast as possible and then wait for all responses)
@benchmark bench_ready(1000)
BenchmarkTools.Trial: 32 samples with 1 evaluation per sample.
Range (min โฆ max): 149.132 ms โฆ 181.694 ms โ GC (min โฆ max): 0.00% โฆ 0.00%
Time (median): 157.729 ms โ GC (median): 0.00%
Time (mean ยฑ ฯ): 160.711 ms ยฑ 7.917 ms โ GC (mean ยฑ ฯ): 0.00% ยฑ 0.00%
โ โโโ โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
149 ms Histogram: frequency by time 182 ms <
Memory estimate: 2.37 MiB, allocs estimate: 54179.
Dividing by 1000 we get a mean of 160.711 ฮผs down from 722.812 ฮผs in the sync case, around a 4.5x speedup not having to wait for a response before sending the next request. The ICMP RTT to this server is ~300 ฮผs from my computer on the LAN.