Julia's deployment in a production environment (100k~200k QPS)

Two years ago we rewrote our original Java service with Julia and got no less performance than before, but we ran into some problems.

Our system has Nginx as load balancing, which forwards requests to Julia (HTTP.jl) for processing, with each request querying the Redis cluster and calculating whether to return or not. Some requests also need to request external services, so a package like HTTP.jl that includes Server and Client is more convenient.

We enabled 160 Julia processes on 40 [8Core, 32G] machines, 4 processes per machine, which is an average of 650 QPS per process and 2600 QPS per machine. here is our startup bash script:

for i in $(seq $OFFSET_PROC $((CPU_CORE_NUM-1+OFFSET_PROC)))
do
    IMG_ARG=""
    if [ -e $SYS_IMAGE ]; then
        echo "Using $SYS_IMAGE..."
        IMG_ARG="-J$SYS_IMAGE"
    fi
    julia  --threads 1 --color=yes --project=@. -q $IMG_ARG -- $(dirname $0)/../bootstrap.jl "$@" -pi $i &
done

And main fuction:

function main()

    port = 8000
    proc_index = 0

    if length(ARGS) > 1 && getindex(ARGS, length(ARGS) - 1) == "-pi"
        proc_index = parse(Int, getindex(ARGS, length(ARGS)))
    end

    function start_server(i)
        access_log_formatter = init_logging(i)
        routers = setup_router()

        println("Running on http://127.0.0.1:8000+$i/")
        schedule(@task run(pipeline(`python3 scripts/compile_routes.py -i $i`)))
        HTTP.serve(routers, "0.0.0.0", 8000 + i; access_log=access_log_formatter)
    end

    start_server(proc_index)
end

The problem we encountered was:

  1. Beginning in Julia 1.9, the main server loop is spawned on the interactive threadpool by default, so we limited " --threads 1", because with " --threads auto", the CPU usage becomes high and the QPS processed is not significantly improved, even more likely to get crash.
  2. Our main function calls a Python script with a thread requesting all the routes. However, there will still be uncompiled functions, resulting in slower processing of requests at first. Although load balancing allows us to control it by forwarding a small number of requests first and then adding more requests after it’s all compiled, it would be better if it was ready to go right out of the box at startup.

Looking forward to getting your advice if anyone else is experiencing similar issues.

Other than that, Julia is performing well, we are using 1.10-beta in our production environment and it’s working fine, thanks!

7 Likes

Welcome! You may find this topic from a few weeks ago relevant, particularly with regards to multithreading:

1 Like