Hi everyone, new to Julia and looking to integrate it into a SaMD as an alternative to Numba for performance critical sections of the code.
As part of the performance critical sections, I’m trying to to use gRPC to call nVidia Triton Inference Server after image preprocessing finishes. The Triton client part of the project is something I would like to open source and maintain as a library once I feel it is production ready. However, anytime I concurrently use or re-use a connection (HTTP/2 multiplexing) I get deadlocks and this error message (often repeated 100’s of times):
┌ Error: curl_multi_socket_action: 8
└ @ Downloads.Curl ~/.virtualenvs/full-solution/julia_env/pyjuliapkg/install/share/julia/stdlib/v1.11/Downloads/src/Curl/utils.jl:57
This happens when I use async or threads, and even sometimes when I re-use a connection. I’m using the gRPCClient.jl library, which uses Downloads.Curl under the hood. Julia is being called from Python using Juliacall.
I worked around the problem in a “good enough” way for my use case by creating a connection pool and never re-using the same connection. But I was wondering if outside of this specific library if there were any pitfalls I should watch out for as a new Julia user when using Downloads.Curl when it comes to concurrency/async.
Here is an example snippet that triggers the bug. Removing the @async
is sufficient to make it work reliably.
# Download gRPC proto
using Downloads
Downloads.download("https://raw.githubusercontent.com/kserve/kserve/refs/heads/master/docs/predict-api/v2/grpc_predict_v2.proto", "grpc_predict_v2.proto")
# Generate gRPC client/service code
using gRPCClient
gRPCClient.generate("grpc_predict_v2.proto")
# Load the generated code
include("./InferenceClients.jl");
using .InferenceClients
import .InferenceClients: ModelInfer, ModelInferRequest, ModelInferResponse, InferParameter, InferTensorContents
import .InferenceClients: ModelInferRequest_InferInputTensor, ModelInferRequest_InferRequestedOutputTensor
# If anyone wants to try and reproduce the issue I can provide a real endpoint
client = GRPCInferenceServiceBlockingClient("http://127.0.0.1:8000")
@sync begin
for i in 1:256
@async begin
inp = zeros(Float32, 1, 224, 224)
input__0 = ModelInferRequest_InferInputTensor(
name="INPUT__0",
datatype="FP32",
shape=collect(size(inp)),
contents=InferTensorContents(fp32_contents=vec(inp))
)
request = ModelInferRequest(
model_name="my_model_name",
inputs=[input__0]
)
response, status = ModelInfer(client, request)
gRPCCheck(status)
end
end
end
Reported the issue to gRPCClient.jl: Constant deadlocks with any async usage · Issue #42 · JuliaComputing/gRPCClient.jl · GitHub
julia> versioninfo()
Julia Version 1.11.5
Commit 760b2e5b739 (2025-04-14 06:53 UTC)
Build Info:
Official https://julialang.org/ release
Platform Info:
OS: Linux (x86_64-linux-gnu)
CPU: 128 × AMD EPYC 7773X 64-Core Processor
WORD_SIZE: 64
LLVM: libLLVM-16.0.6 (ORCJIT, znver3)
Threads: 1 default, 0 interactive, 1 GC (on 128 virtual cores)