[Help Wanted] gRPCClient2.jl: Production Grade gRPC Client

Hello Julia Community,

I have been working on a new gRPC client with an emphasis on production grade performance and reliability. The client is already thread / async safe and uses the fast and up to date 1.0 version of ProtoBuf.jl. There is zero extra memory copying or buffering between the client and libCURL and optimizations to reduce overhead of having many small streams over multiplexed HTTP/2.

Repo: GitHub - csvance/gRPCClient2.jl: Production Grade gRPC Client for Julia
Docs: gRPCClient2.jl ยท gRPCClient2

The name of the package is just a placeholder while it is under rapid development. The client borrows some code from gRPCClient.jl and Downloads.jl, so thanks to the maintainers/contributors of those packages for helping bootstrap this effort.

Looking for collaborators in general but right now I need the following:

  • general usage testing / feedback on interfaces and API
  • more test coverage for streaming / test against more gRPC servers than just Python

Of course I am working through most these myself but I would appreciate any help if you are interested in having a production grade gRPC client in Julia.

The latency / overhead / resource usage is currently quite minimal. Some benchmarks bellow (API not final):

// Benchmark a unary RPC with small protobufs to demonstrate overhead per request
// subset of grpc_predict_v2.proto for testing

syntax = "proto3";
package inference;

message ServerReadyRequest {}

message ServerReadyResponse
{
  // True if the inference server is ready, false if not ready.
  bool ready = 1;
}

service GRPCInferenceService
{
  // The ServerReady API indicates if the server is ready for inferencing.
  rpc ServerReady(ServerReadyRequest) returns (ServerReadyResponse) {}
}
using ProtoBuf
using BenchmarkTools
using gRPCClient2
using Base.Threads

include("grpc_predict_v2_pb.jl")

const grpc = gRPCCURL()

function bench_ready(n)
    @sync begin

        requests = Vector{gRPCRequest}()
        for i in 1:n
            request = ServerReadyRequest()
            # once we generate bindings from the service definition this will be much cleaner
            req = grpc_unary_async_request(grpc, "grpc://rpctest.local:8001/inference.GRPCInferenceService/ServerReady", request)
            push!(requests, req)
        end

        for req in requests
            response = grpc_unary_async_await(grpc, req, ServerReadyResponse)
        end
    end
end
# Sync usage (must wait for response before sending next request)
@benchmark bench_ready(1)

BenchmarkTools.Trial: 6821 samples with 1 evaluation per sample.
 Range (min โ€ฆ max):  370.152 ฮผs โ€ฆ   6.193 ms  โ”Š GC (min โ€ฆ max): 0.00% โ€ฆ 0.00%
 Time  (median):     520.608 ฮผs               โ”Š GC (median):    0.00%
 Time  (mean ยฑ ฯƒ):   722.812 ฮผs ยฑ 671.093 ฮผs  โ”Š GC (mean ยฑ ฯƒ):  0.00% ยฑ 0.00%

  โ–‚โ–†โ–ˆโ–†โ–…โ–„โ–ƒโ–‚โ–‚โ–                                                    โ–
  โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–‡โ–†โ–…โ–†โ–„โ–„โ–…โ–…โ–„โ–†โ–„โ–„โ–…โ–†โ–†โ–†โ–…โ–‚โ–„โ–…โ–…โ–…โ–ƒโ–…โ–…โ–„โ–…โ–…โ–…โ–…โ–„โ–…โ–„โ–„โ–†โ–†โ–†โ–†โ–†โ–†โ–‡โ–‡โ–‡โ–†โ–…โ–„โ–„โ–„โ–„โ–‚ โ–ˆ
  370 ฮผs        Histogram: log(frequency) by time        3.9 ms <

 Memory estimate: 5.36 KiB, allocs estimate: 109.
# Async usage (send all requests as fast as possible and then wait for all responses)
@benchmark bench_ready(1000)

BenchmarkTools.Trial: 32 samples with 1 evaluation per sample.
 Range (min โ€ฆ max):  149.132 ms โ€ฆ 181.694 ms  โ”Š GC (min โ€ฆ max): 0.00% โ€ฆ 0.00%
 Time  (median):     157.729 ms               โ”Š GC (median):    0.00%
 Time  (mean ยฑ ฯƒ):   160.711 ms ยฑ   7.917 ms  โ”Š GC (mean ยฑ ฯƒ):  0.00% ยฑ 0.00%

       โ–        โ–„โ–ˆโ–    โ–                 โ–                       
  โ–†โ–โ–โ–†โ–โ–ˆโ–†โ–โ–†โ–†โ–โ–โ–โ–†โ–ˆโ–ˆโ–ˆโ–†โ–โ–โ–†โ–ˆโ–†โ–†โ–โ–โ–โ–†โ–โ–โ–†โ–โ–โ–โ–โ–โ–†โ–โ–โ–ˆโ–โ–โ–†โ–โ–†โ–โ–โ–โ–โ–โ–โ–โ–โ–โ–†โ–โ–โ–โ–โ–โ–† โ–
  149 ms           Histogram: frequency by time          182 ms <

 Memory estimate: 2.37 MiB, allocs estimate: 54179.

Dividing by 1000 we get a mean of 160.711 ฮผs down from 722.812 ฮผs in the sync case, around a 4.5x speedup not having to wait for a response before sending the next request. The ICMP RTT to this server is ~300 ฮผs from my computer on the LAN.

9 Likes

Opened a pull request to add support for RPC client stub code generation with ProtoBuf.jl. Depending on how fast Iโ€™m able to get services support fully worked out and merged, we could have the v0.1 release in the next few weeks. Tests and CI/precompile infrastructure have been setup and considerable work has been done to smooth out rough edges in terms of having useful exception messages, fixing memory/handle leaks, etc.

I also cleaned up the public interface / API:

using gRPCClient2

# Include the protobuf definitions and RPC client stubs
include("gen/proto/test_pb.jl")

# Initialize the gRPC package - grpc_shutdown() does the opposite for use with Revise.
grpc_init()

# Create a client from the generated client stub
client = TestService_TestRPC_Client("localhost", 8001; secure=false)

# Sync API
test_response = grpc_sync_request(client, TestRequest(1))

# Async API
requests = Vector{gRPCRequest}()
for i in 1:10
    push!(
        requests, 
        grpc_async_request(client, TestRequest(1))
    )
end

for request in requests
    response = grpc_async_await(client, request)
end

Now that the API is relatively stable Iโ€™m going to start writing documentation and will continue to stress test the client and fix any remaining undiscovered bugs.

1 Like

Looks like some neat work!

As someone who is specifically not a fan of xyz2, xyz3 packages, are you interested in talking to the gRPCClient.jl folks about potentially replacing it with your package?

2 Likes

Nice to see some interest and movement in the gRPC support for Julia. This has long been a stumbling block when integrating Julia services in larger projects, and there arenโ€˜t that many alternatives really, e.g. there is also no AMQP v1 support in the ecosystem either (although I did make a start there, maybe I need to request for help there too). So it is nice to see that someone is willing to push this domain further. Maybe someday weโ€˜ll also have a gRPC server in Julia.

Nice to see some basic integration testing set up as well. Maybe I could help (if I get some spare time, lol) with setting up test servers in a few more languages to test against. At least JS/TS and Go would be nice to catch any inconsistencies in implementations (I heard that there can be some).

2 Likes

Good idea. Once we are a little farther along with documentation and testing we can open the discussion. I just didnโ€™t want to bother them until it was clear how serious this effort is :sweat_smile:

Indeed, when I was first trying to adopt Julia for a project at work this ended up being such a large roadblock I almost gave up on the language. So it will be good if no one else ever has to go through that again :sweat_smile:

gRPC server in Julia is on my radar currently, it may be possible to do with nghttp2 which already has a JLL package. Once the client initiative is complete I will look into it more.

That would be much appreciated! Get me some spare time too :grinning_face_with_smiling_eyes:

1 Like

I just merged streaming RPC support. Request, response, and bidirectional are all supported. Test coverage for streaming RPC isnโ€™t nearly as comprehensive as it is for unary RPC currently, but the basics are working.

This should bring us to feature parity with gRPCClient.jl now. I also reached out to the maintainers to ask about doing a 1.X version release with the new codebase.

3 Likes

Streaming RPC should be stable now after a decent amount of stress testing and bugfixing sessions. We are also now outperforming Pythonโ€™s gRPC client which tends to have pretty solid performance in my experience.

I ran some benchmarks using the Python gRPC client to compare the overhead between Pythonโ€™s grpcio package and gRPCClient2.jl. I gave it 24 threads, same as julia -t auto on my system when I run benchmark_workload_smol() from workloads.jl in the gRPCClient2.jl repo. Iโ€™m aware the GIL is a thing, but calling into grpcio releases the GIL so it shouldnโ€™t be a significant bottleneck. Pretty much none of grpcio is written in Python, which is why its fast :sweat_smile:

 % uv run grpc_test_client.py 30
average: 7019.47 RPS
std: 202.31 RPS
min: 6153.18 RPS
max: 7278.98 RPS

In Julia running the same benchmark with all the recent changes I get (keep in mind this is doing 1000 requests per trial, so we will have divide the results by 1000).

julia> benchmark_workload_smol()
BenchmarkTools.Trial: 41 samples with 1 evaluation per sample.
 Range (min โ€ฆ max):  108.345 ms โ€ฆ 135.084 ms  โ”Š GC (min โ€ฆ max): 0.00% โ€ฆ 7.75%
 Time  (median):     123.482 ms               โ”Š GC (median):    0.00%
 Time  (mean ยฑ ฯƒ):   122.444 ms ยฑ   6.091 ms  โ”Š GC (mean ยฑ ฯƒ):  0.33% ยฑ 1.41%

                               โ–ˆ   โ–ƒโ–ƒโ–ƒโ–ˆ โ–ˆ  โ–ˆ โ–ƒ                   
  โ–‡โ–โ–‡โ–โ–โ–โ–โ–‡โ–‡โ–โ–โ–โ–‡โ–โ–โ–โ–โ–‡โ–‡โ–‡โ–‡โ–‡โ–โ–‡โ–โ–‡โ–‡โ–โ–โ–ˆโ–โ–‡โ–โ–ˆโ–ˆโ–ˆโ–ˆโ–โ–ˆโ–‡โ–โ–ˆโ–‡โ–ˆโ–โ–โ–โ–โ–โ–‡โ–‡โ–‡โ–โ–โ–โ–โ–‡โ–โ–โ–โ–‡ โ–
  108 ms           Histogram: frequency by time          135 ms <

 Memory estimate: 4.27 MiB, allocs estimate: 93559.

There are 1000 requests per trial, so divide the mean by 1000 and convert to RPS:

julia> 1 / (0.122444 / 1000)
8166.998791284178

We are beating Python by over 1000 RPS on average.

8 Likes

Does anyone care if we start support with Julia 1.12? It looks like there are some issues specifically with supporting streaming in 1.10/1.11 that were resolved in 1.12. I donโ€™t really have time to dig deeper into it, but as things stand now I plan on supporting all Julia versions >= 1.12. CI has been updated to test against 1.12 and nightly. Iโ€™m also testing as part of a production system that uses gRPC, so far so good.

Sometime in the next few weeks the package will be submitted for registration. Iโ€™m hoping that as part of the registration process I can get in contact with the people I need to about the package name + up streaming gRPC codegen into ProtoBuf.jl.

2 Likes

Thank you very much for taking the time to work on this. Lack of proper gRPC support in Julia was severely limiting IMO

2 Likes

@csvance

Iโ€™d be super interested in a gRPC server for julia.

When you get this to a decent point Iโ€™d love to experiment targeting this as an alternative backend for Oxygen.jl. The package already performs introspection on all inputs and outputs of its handlers so I could potentionally just generate the proto schema & files and hook them into your gRPC server.

I could even find some way to enable both gRPC and HTTP servers to run in the same application, to give people multiple ways to connect to their app

In theory the handlers could look something like:

# Hypothetical macro for gRPC handlers (auto-generates proto and hooks into some grpc server)
@grpc function add(request::MathRequest)
    result = request.a + request.b
    return MathResponse(result)
end

@grpc function create_peron()
    person = Person("joe", 20)
    return PersonResponse(person)
end

# ...existing code (e.g., @get "/add/{a}/{b}" remains for HTTP)...

serve(port=8080)  # HTTP server; gRPC would run separately or integrated

Below is the theorectical schema that could be generated from those handlers

syntax = "proto3";

package oxygen_example;

import "google/protobuf/empty.proto";

message MathRequest {
  double a = 1;
  double b = 2;
}

message MathResponse {
  double result = 1;
}

message Person {
  string name = 1;
  int32 age = 2;
}

message PersonResponse {
  Person person = 1;
}

service OxygenService {
  rpc Add(MathRequest) returns (MathResponse);
  rpc GetPerson(google.protobuf.Empty) returns (PersonResponse);
}
1 Like

The gRPC server is in development but Iโ€™m not making any promises on the timeline yet. The current approach Iโ€™m in favor of is to use nghttp2 together with Julia TCP Sockets, but I have no idea how much worse that would perform compared to working directly with Juliaโ€™s libuv interface. The reason I like this approach is it could produce something thatโ€™s actually useful to many people even before its written in a completely optimal way while not requiring a complete rewrite to take it the rest of the way.

Update on gRPCClient2.jl

There is now an active pull request / code review in progress for using gRPClient2.jl as the 1.0.0 release for gRPCClient.jl. Not sure exactly when it will be done, but things are looking good so far!

4 Likes

gRPCClient2.jl is now gRPCClient.jl with a new home in JuliaIO :tada:

As for actually registering the 1.0.0 release, all that remains is to finish up-streaming code generation support into ProtoBuf.jl.

The test server was rewritten in Go as it turns out some of the benchmarks were bottlenecked by Pythonโ€™s GIL. Throughput doubled in cases with very small messages. @atthom reworked the benchmark scripts to use PrettyTables.jl with proper units showing everything all together nicely.

julia -t auto

โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚                        Benchmark โ”‚       N โ”‚ Memory โ”‚ Allocations โ”‚ Duration โ”‚ Throughput โ”‚ Avg duration โ”‚ Std-dev โ”‚  Min โ”‚  Max โ”‚
โ”‚                                  โ”‚   calls โ”‚    MiB โ”‚             โ”‚        s โ”‚    calls/s โ”‚           ฮผs โ”‚      ฮผs โ”‚   ฮผs โ”‚   ฮผs โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                    workload_smol โ”‚   91000 โ”‚   3.75 โ”‚       85123 โ”‚     5.03 โ”‚      18079 โ”‚           55 โ”‚    3.96 โ”‚   48 โ”‚   67 โ”‚
โ”‚        workload_32_224_224_uint8 โ”‚    2900 โ”‚  63.78 โ”‚        9188 โ”‚     5.01 โ”‚        579 โ”‚         1728 โ”‚   97.86 โ”‚ 1614 โ”‚ 1899 โ”‚
โ”‚       workload_streaming_request โ”‚ 1841000 โ”‚   0.89 โ”‚        6482 โ”‚     4.99 โ”‚     368669 โ”‚            3 โ”‚    1.35 โ”‚    2 โ”‚   21 โ”‚
โ”‚      workload_streaming_response โ”‚  330000 โ”‚   13.0 โ”‚       27838 โ”‚     5.02 โ”‚      65771 โ”‚           15 โ”‚     5.2 โ”‚    6 โ”‚   37 โ”‚
โ”‚ workload_streaming_bidirectional โ”‚  405000 โ”‚   1.48 โ”‚       25672 โ”‚      5.0 โ”‚      80948 โ”‚           12 โ”‚    8.52 โ”‚    3 โ”‚   62 โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ