Hello Julia Community,
I have been working on a new gRPC client with an emphasis on production grade performance and reliability. The client is already thread / async safe and uses the fast and up to date 1.0 version of ProtoBuf.jl. There is zero extra memory copying or buffering between the client and libCURL and optimizations to reduce overhead of having many small streams over multiplexed HTTP/2.
Repo: GitHub - csvance/gRPCClient2.jl: Production Grade gRPC Client for Julia
Docs: gRPCClient2.jl Β· gRPCClient2
The name of the package is just a placeholder while it is under rapid development. The client borrows some code from gRPCClient.jl and Downloads.jl, so thanks to the maintainers/contributors of those packages for helping bootstrap this effort.
Looking for collaborators in general but right now I need the following:
- general usage testing / feedback on interfaces and API
- more test coverage / test against more gRPC servers than just Python
- streaming RPC support: needs to be done in a way that does not negatively impact performance for the unary RPC case
Of course I am working through most these myself but I would appreciate any help if you are interested in having a production grade gRPC client in Julia.
The latency / overhead / resource usage is currently quite minimal. Some benchmarks bellow (API not final):
// Benchmark a unary RPC with small protobufs to demonstrate overhead per request
// subset of grpc_predict_v2.proto for testing
syntax = "proto3";
package inference;
message ServerReadyRequest {}
message ServerReadyResponse
{
// True if the inference server is ready, false if not ready.
bool ready = 1;
}
service GRPCInferenceService
{
// The ServerReady API indicates if the server is ready for inferencing.
rpc ServerReady(ServerReadyRequest) returns (ServerReadyResponse) {}
}
using ProtoBuf
using BenchmarkTools
using gRPCClient2
using Base.Threads
include("grpc_predict_v2_pb.jl")
const grpc = gRPCCURL()
function bench_ready(n)
@sync begin
requests = Vector{gRPCRequest}()
for i in 1:n
request = ServerReadyRequest()
# once we generate bindings from the service definition this will be much cleaner
req = grpc_unary_async_request(grpc, "grpc://rpctest.local:8001/inference.GRPCInferenceService/ServerReady", request)
push!(requests, req)
end
for req in requests
response = grpc_unary_async_await(grpc, req, ServerReadyResponse)
end
end
end
# Sync usage (must wait for response before sending next request)
@benchmark bench_ready(1)
BenchmarkTools.Trial: 6821 samples with 1 evaluation per sample.
Range (min β¦ max): 370.152 ΞΌs β¦ 6.193 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 520.608 ΞΌs β GC (median): 0.00%
Time (mean Β± Ο): 722.812 ΞΌs Β± 671.093 ΞΌs β GC (mean Β± Ο): 0.00% Β± 0.00%
βββββ
βββββ β
ββββββββββββββ
ββββ
β
βββββ
ββββ
βββ
β
β
ββ
β
ββ
β
β
β
ββ
βββββββββββββ
βββββ β
370 ΞΌs Histogram: log(frequency) by time 3.9 ms <
Memory estimate: 5.36 KiB, allocs estimate: 109.
# Async usage (send all requests as fast as possible and then wait for all responses)
@benchmark bench_ready(1000)
BenchmarkTools.Trial: 32 samples with 1 evaluation per sample.
Range (min β¦ max): 149.132 ms β¦ 181.694 ms β GC (min β¦ max): 0.00% β¦ 0.00%
Time (median): 157.729 ms β GC (median): 0.00%
Time (mean Β± Ο): 160.711 ms Β± 7.917 ms β GC (mean Β± Ο): 0.00% Β± 0.00%
β βββ β β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
149 ms Histogram: frequency by time 182 ms <
Memory estimate: 2.37 MiB, allocs estimate: 54179.
Dividing by 1000 we get a mean of 160.711 ΞΌs down from 722.812 ΞΌs in the sync case, around a 4.5x speedup not having to wait for a response before sending the next request. The ICMP RTT to this server is ~300 ΞΌs from my computer on the LAN.