Reject requests in HTTP.jl server based on memory usage

The idea is to prevent a web service to suffer memory exhaustion by rejecting requests, or react to other similar resource allocation issues based on the machine state. I suppose it can be done using the HTTP.Middleware approach discussed in the documentation. I’m curious to hear if anyone has implemented something like that. Might there be an example somewhere with other similar useful features implemented?

I’m not sure this is needed. I mean the garbage collector should take of all garbage (it should be aggressive enough by default, if it isn’t it seems like a bug; also you can tune in 1.9 how aggressive it is, i.e. set some mem limit per process, that may help but doesn’t seem enough for you).

So if you are thinking about live memory, then for each request, it’s not clear how high you want to go, and memory use is a global issue for the process, or rather the machine as a whole for all processes combined (possible OOM issue). For one Julia process, Julia’s GC can only control its own process, not other non-Julia (and/or non-web related) process, so GC is tricky business. The GC can in theory monitor RAM use of other processes, the GC IS aware of the amount of physical RAM installed, but I believe it’s only info it uses, doesn’t monitor changes in use of other processes as it would slow down GC (maybe not slow down allocator though?).

PHP is interesting in this regard, for each web request there’s some max allowed mem use. You CAN change that max. I don’t believe Julia has a max per thread, only per process. Such might be helpful. Also when you go over a max, what happens, or should happen? You get an exception, but does the end user on the web client get a useful error message?

[D]DOS is a tricky business usually handled elsewhere, in a firewall or even further away.

Yeah, maybe I’m using so much memory due to some bug, but I’m willing to believe everything is fine and GC is doing its job, and I’m just really trying to use more memory than I can.

Your last question is indeed the crux, what to do exactly? I’m thinking of ignoring the request, or maybe returning some error code, either 500 or maybe something like 503, 429?.. or 418! These are pretty much the options I see.

You want to limit the access to the database[s] and web server[s] or in the application (unless you have (almost) infinite capacity, see the relevant paragraph on that, on clouds/elastic, web servers can be scaled, it’s more difficult for relational databases).

I worked as a database admin, and sometimes processes in PostgreSQL piled up (it’s not as bad as it sounds, processes are cheap in Linux, not Windows why the DB uses threads there; in Windows only threads are cheap, a process in Linux is like a thread in Windows; except COW), but almost without fail because of a problem on the web server (not DDOS though occasionally suspected, just most likely because of a change in the software application running on the web servers).

The DB has a max. number of connection, a setting that you can tune (then also others too, see above), but if you use that (and we did), then rather processes pile up at the web server[s] (, at least if you set too low. Each one may have a max (I don’t recall the config in Apache) and/or possibly a max in front of it in the load balancer?

My own idea, that I had after leaving that job: it seems rather simple to limit number of requests in your web application. You could have a global count of active requests, and that’s not simple across web servers, but we used memcached for such (for other stuff you need anyway, i.e. cookies). Possibly this is tricky to implement and would be a single-point of failure, so you could do it per web server (then no memcached for it), likely better. With a limit you can report a helpful error message: “The server is under heavy load, please try again later”, and log that.

Such capability is also there in the web servers (it just wasn’t my job to configure, so I don’t know if doing there is better for a[dynamic] website), at least with Apache and Nginx, so also look into that, then you don’t need to implement anything in Julia. This assumes you use a web server. Julia can also function as its own web server without any other software. @essenciary might know, or if such logic could be added to Genie. It supports running alone or with Apache, last time I knew, but I see only now Nginx mentioned prominently in its docs. I though see: “In Genie v5 the recommended approach is to set up SSL at the proxy server level, for example by using Caddy, Nginx, or Apache as a reverse proxy.” Apache is completely free, Nginx more popular, has a free version, that doesn’t include what you want. CDNs like Cloudflare [Server] might have what you need, I understand LiteSpeed is fast… as they claim with the name. Cloud providers might have something for you, e.g. AWS and Heroku, that Genie supports. Julia supports more, and maybe Genie too, it might just be unstated. I’m not sure, I think e.g. Google Cloud and others just emulate AWS, so instructions for Genie would work there too. Setting limits is outide of Genie I think, so AWS instructions may or may not work the same.

Nginx is by now the most popular web server:
Nginx 34.4%
Apache 31.9%
Cloudflare Server 20.5%
LiteSpeed 11.9%

As far as I can see, max_conns would be exactly what I’m looking for, but unfortunately it’s not available in the free version:

Additionally, the following parameters are available as part of our commercial subscription

Limiting worker_connections is not an option, as the minimum it wants is 4, and it affects more than the incoming requests.

https://community.cloudflare.com/t/question-about-simultaneous-open-connections-limit/128350

There’s a max in Caddy, but it seems to be years old software, so is it just mature or outdated to use?

Module ngx_stream_upstream_module

  • max_conns is the maximum number of concurrent requests to each backend. The default value of 0 indicates no limit. When the limit is reached, additional requests will fail with Bad Gateway (502).

As I thought, some such less friendly error expected, so better to implement internally, in app or Genie?

Note, number of requests is a proxy for real load. The real limit is mem use or CPU load. CPU load gets catastrophic at some point, but usually indirectly because of running out of mem. Both can be mitigated elastic use of VMs with cloud services such as AWS. I’ve just not used it so don’t ask me how to set up.

1 Like

Just chiming in to say that you certainly could use the “middleware” approach to handle a scenario where a client tries to send a request w/ like, a 10GB request body. The general approach here would be:

  • Define a streaming middleware (which has the form f(::HTTP.Stream) -> Nothing, instead of the higher-level request middleware which has the form f(::Request) -> Response))
  • In the streaming middleware, you would call startread(::Stream), which returns once the headers have been received on the request
  • Then check the Content-Length header to see if the request body is going to be over some max limit you’ve set for your server
  • If the request body is too large, you can reject and close the stream immediately (returning a 413 is traditional), to ensure the actual large body isn’t read into memory or allowed to affect the server.

I’ve wanted to include something like this in HTTP as a sort of “default middleware” people could use, but haven’t gotten around to it. If someone wants to pick it up and make a PR, I’d be happy to review/merge.

3 Likes

You have a very good understanding of my situation. :slight_smile: I am indeed using a cloud service, and I can probably lower the maximum number of concurrent requests to prevent the memory exhaustion. But first of all I don’t really wanna have to think about it, better to rely on error feedback. And second, I would like to prevent memory exhaustion from ever happening, what causes my instance to be terminated and that’s not good: better to ignore or reject the request.

The scaling algorithm from my service looks at CPU load, but not memory. And it would be tricky anyways because with GC it’s just natural that memory will tend to be close to fully “allocated”. Common situation also with Java, as far as I know. One open question is how the scaling algorithm will behave once I start ignoring or rejecting requests, hopefully it might pick up a specific error (503) and actually use that as a signal to increase the number of instances, but I suspect this won’t be the case… Anyways, my question is really just about how to implement this in HTTP.jl.

Thanks, it’s great you’ve come up with an even nicer scenario! I’ve been using meddleware for authentication and for spawning new threads for each request (that’s maybe another question). Really just wanted to validate this is the way to go. Maybe I’ll create a PR if I feel I really have something interesting. Thanks!