Julia can be better at doing web: A benchmark

Won’t this start nproc servers all listening on the same port? I wonder if that is the best way to set things up, as that probably won’t have any load balancing or similar between the different processes.

I’m also interested in potential problems in this domain and to solve them. I applaud your effort and willingness to solve this.
I just want to make sure that the benchmark is using the best setup according to the current design of HTTP.jl. Maybe you can raise even more awareness and involve HTTP.jl maintainers as well, by opening an issue over on GitHub?

Yes - EDIT: actually the Linux kernel does provide “free” load balancing for processes listening on a REUSEPORT socket (so “no” to the second part :sweat_smile:)

Me too, I’m trying to raise awareness and get eyes on this.

So, I’ve been busy with trying various stuff, like

but the performance improvements are not that great (around 5% more throughput in some scenarios), and also, the non-blocking version of LibPQ is 3ms slower than the blocking one (which is consistent with python’s asyncio, which I tested)

So, my current curiosity frontier is:

  1. How much time should I expect between starting listening on a socket (FDWatcher) and actually being notified about its state (I notice a 3ms overhead on the LibPQ’s async_execute that relies on watching the connection socket). Are there lower/upper bounds?
  2. Is this load-specific?
  3. Can I make it faster/tweak it for shorter tasks?
  4. When does the (julia) scheduler become overwhelmed?
  5. When can we say “I can’t accept any more connections”?
  6. Are cross-thread Channels lighter than Task.@spawn tasks that migrate (oxygen streamUntil approach)?

any pointers are very very welcome!

5 Likes

That‘s interesting. How can we be sure that a REUSEPORT socket is being used? Sorry if this should be obvious.

HTTP.listen(; reuseaddr=true) or serve or Sockets.bind(...;reuseaddr=true) will request it, but to be sure sure, start julia with strace strace -e trace=setsockopt julia server.jl (on Linux; on Windows I have no idea). Let me know if that helps!

While a “web benchmark” under discussion here, partly DB related, then connecting can possibly be improved.

I didn’t look at the pool functionality, but is it redundant with new feature in just released PostgreSQL 16, or its libpq (note, latest libpq might work for older PostgreSQL, it usually does, maybe, maybe not, for this too? Pg_pool is also available, seems not partially redundant project): PostgreSQL: Documentation: 16: E.2. Release 16

  • Allow multiple libpq-specified hosts to be randomly selected (Jelte Fennema)
    This is enabled with load_balance_hosts=random and can be used for load balancing.

FYI: It also has a number of good new features, such as more complete support for the SQL/JSON standard (I was proposing recently in another thread, to rather than serialize Julia data/arrays to BLOBs, i.e. its bytea proprietary type). Also off-topic:

PostgreSQL 16 improves general support for text collations, which provide rules for how text is sorted. PostgreSQL 16 builds with ICU support by default, determines the default ICU locale from the environment, and allows users to define custom ICU collation rules.

Maybe ICU is new, or new by default, but it reminded my of that at least the latest ICU has e.g. “significant changes for GB18030-2022 compliance support”, i.e. for Chinese. That latest Chinese standard is slightly incompatible, I’m not sure if it affects Julia users, i.e. Unicode/UTF-8 too in some way.

Version 23.1 of Ora2Pg, a free and reliable tool used to migrate an Oracle database to PostgreSQL, has been officially released and is publicly available for download.
[…]
New command line option --lo_import. By default Ora2Pg imports Oracle BLOB as bytea, the destination column is created using the bytea data type.

Even more off-topic (unless it helps some actual real-world users, if not benchmarks, maybe some Julia code should be off-loaded to the database, I think there’s a PL/Julia out there):

It’s not one of the official, but with precompiled code could be getting increasingly relevant to use (and maybe get into PostgreSQL as official?):

If I understand the feature correctly, it works with a comma separated list of hosts (PostgreSQL: Documentation: 16: 34.1. Database Connection Control Functions - last paragraph) to connect to one of them randomly. To use that you need a cluster of databases that are all replicas of each other or something. You still get one connection back, and you’ll have to put it in a pool locally, for that to count. So it doesn’t make a connection pool on the client side redundant.

Note that this is not about changing the environment of the benchmark; if we change the environment (I.e. add a database cluster) we run a different experiment (and then our competitors will also get the opportunity of x10 db operations, for example).

Updating LibPQ version on the other hand isn’t something I would object too :sweat_smile:. LibPQ_jll is at version 14 IIRC

1 Like

It’s actually at version 16, “since last week”. The JLL, and I assume the underlying C wrapped library (and its version number in sync with the database version).

I would also consider wrapping: libpqxx: the official C++ language binding for PostgreSQL

2023-01-12: At last! Faster than C.

Then in July:

Welcome to libpqxx 7.8.0. Lots of goodies for you. Probably enough that I could have called it 8.0 — except libpqxx 8.0 is going to require C++20. For now you’re still fine with C++17.

In 7.8 you get, among other things:

  • Streaming large data sets now benchmarks faster than similar C/libpq code! […]

since it’s claims faster, and it’s “official”: GitHub - jtv/libpqxx: The official C++ client API for PostgreSQL.

The 7.x versions require at least C++17. Make sure your compiler is up to date. For libpqxx 8.x you will need at least C++20.

Also, 7.0 makes some breaking changes in rarely used APIs:

1 Like

There is another web benchmark the-benchmarker which covers almost all languages in a simpler test; in case there is interest in adding HTTP.jl in addition to Merly.jl

Results

1 Like