logo
Apache Lounge
Webmasters

 

About Forum Index Downloads Search Register Log in RSS X


Keep Server Online

If you find the Apache Lounge, the downloads and overall help useful, please express your satisfaction with a donation.

or

Bitcoin

A donation makes a contribution towards the costs, the time and effort that's going in this site and building.

Thank You! Steffen

Your donations will help to keep this site alive and well, and continuing building binaries. Apache Lounge is not sponsored.
Post new topic   Forum Index -> Apache View previous topic :: View next topic
Reply to topic   Topic: GracefulShutdownTimeout setting is doing nothing
Author
deepwell



Joined: 19 Jul 2025
Posts: 7
Location: AT, Vienna

PostPosted: Sun 20 Jul '25 12:12    Post subject: GracefulShutdownTimeout setting is doing nothing Reply with quote

Hi all!

we have an issue with Apache in connection to long running WebSocket connections and graceful reloads initiated by logrotate.

Whenever logrotate initiates the graceful reload (every morning), all Apache workers currently handling open WebSocket connections go into 'G' state and - as the WebSocket connections stay open (which is their primary purpose!) - stay that way. This puts the whole Apache process that worker belongs to (we are using event MPM) into a state of having some workers in 'G' state and the rest not accepting new connections.

Usually that means that at least one Apache process ends up in that state every day - and after a few days we reach ServerLimit, the scoreboards is full and we get a "AH00485: scoreboard is full, not at MaxRequestWorkers".

Easy fix: Just set `GracefulShutdownTimeout` to something not Zero and Apache should wait that amount of seconds for the 'G' state workers to finish and if they don't, simply kill them and continue the reload.

At least that is what
https://httpd.apache.org/docs/2.4/mod/mpm_common.html#GracefulShutdownTimeout
suggests IMHO.

But in fact it seems that `GracefulShutdownTimeout` does not do anything!

Even if setting this to just a few seconds, the 'G' state workers stay in that state forever and the hole Apache process is useless (and soon the whole Apache server)...Sad


Did I misinterpret the documentation?

Is there any other way to make Apache not wait forever for WebSocket connections to be closed before finishing a graceful reload? Without killing also "normal" connections!

Does this make Apache totally unusable as a reverse proxy for WebSocket connections?
Back to top
tangent
Moderator


Joined: 16 Aug 2020
Posts: 395
Location: UK

PostPosted: Thu 31 Jul '25 15:09    Post subject: Reply with quote

Looking into your problem with GracefulShutdownTimeout, I initially thought you might be running on Windows, with mpm_winnt.

I then noticed you mention event mpm.

Could you try switching to worker mpm to see if that makes a difference?
Back to top
deepwell



Joined: 19 Jul 2025
Posts: 7
Location: AT, Vienna

PostPosted: Thu 31 Jul '25 16:17    Post subject: Reply with quote

Many thx for looking into this! I will try to set up a second machine using worker mpm and get back to you.
Back to top
deepwell



Joined: 19 Jul 2025
Posts: 7
Location: AT, Vienna

PostPosted: Mon 04 Aug '25 12:17    Post subject: Reply with quote

OK, so I tried with the worker MPM now.

It behaves differently but IMHO also not correct.

My worker MPM config for testing is:

Code:
StartServers            1
MinSpareThreads         25
MaxSpareThreads         25
ThreadLimit             25
ThreadsPerChild         25
MaxRequestWorkers       25
MaxConnectionsPerChild  0
ServerLimit             1


If I understand correctly that should mean that no more than 1 Apache process should be active at one time.

When I do a `apache2ctl graceful` while WebSocket connections are active, a new Apache process is started (despite the ServerLimit of 1) and the old one keeps hanging around forever.


Code:
Srv   PID   Acc   M   CPU   SS   Req   Dur   Conn   Child   Slot   Client   Protocol   VHost   Request
0-0   496107   0/0/0   W   0.00   190   0   0   0.0   0.00   0.00   xx.xx.xx.21   http/1.1   xxxxxxxxxx.com:443   GET /ws/ HTTP/1.1
0-1   496152   0/0/1   _   0.00   149   25   49   0.0   0.00   0.01   yy.yy.yy.254   http/1.1      
0-1   496152   0/15/16   _   0.08   162   4   112   0.0   0.05   0.06   yy.yy.yy.254   http/1.1   yyyyyyy.eu:443   GET /server-status HTTP/1.1
0-1   496152   0/0/5   R   0.00   173   4   34   0.0   0.00   0.02   yy.yy.yy.254   http/1.1      
0-0   496107   0/0/0   W   0.00   178   0   0   0.0   0.00   0.00   xx.xx.xx.21   http/1.1   xxxxxxxx.com:443   GET /ws/ HTTP/1.1
0-1   496152   11/11/12   W   0.21   0   0   138   39.4   0.04   0.04   10.70.2.254   http/1.1   yyyyyyy.eu:443   GET /server-status HTTP/1.1


The one with PID 496107 is the old one - and it keeps hanging around indefinitely, or until all its WebSocket connections are closed.

My /etc/apache2/apache2.conf has the line

Code:
GracefulShutdownTimeout 10


And nowhere else is the GracefulShutdownTimeout set.


So, the main difference to the event MPM is that Apache does not get locked up, because it simply ignores the ServerLimit and just launches another process, while the old one keeps hanging around indefinitely the same way as with the event MPM (just not labeled "G"), despite the GracefulShutdownTimeout setting).
Back to top
tangent
Moderator


Joined: 16 Aug 2020
Posts: 395
Location: UK

PostPosted: Mon 04 Aug '25 18:20    Post subject: Reply with quote

Ok, so in relation to your problem, worker and event MPM's differ somewhat, but neither seem to honour the GracefulShutdownTimeout if there are open WebSocket connections.

You've not detailed your configuration for supporting WebSocket services, but assume you're using mod_proxy_http together with appropriate proxy options, including a defined timeout (ProxyTimeout), and possibly keepalive. To which end, is there also a firewall between your Apache and backend WebSocket service, which could also feel it's responsible for honouring keepalives? I've been bitten by this scenario before, where the firewall didn't close out inactive backend connections.

One other thing that might be worth exploring is trying mod_proxy_wstunnel over mod_proxy_http. The documentation suggests that since Apache 2.4.47, mod_proxy_http is favoured for handling websocket tunnelling, to which end you'd need to set ProxyWebsocketFallbackToProxyHttp to Off if you want to use mod_proxy_wstunnel.

The released code for mod_proxy_wstunnel.c hasn't been updated for some 2 years, but notably the current GitHub code for this module includes a ProxyWebsocketIdleTimeout option, which would hopefully resolve your problem. However, there's presumably a good reason why this code hasn't made it into the current Apache release.

Alternatively, as a last resort, could you script something to parse your server-status after a graceful restart, and kill off the stale/errant process?
Back to top
deepwell



Joined: 19 Jul 2025
Posts: 7
Location: AT, Vienna

PostPosted: Mon 04 Aug '25 21:30    Post subject: Reply with quote

Thx again for looking into this. As to your last resort: That's what I already did. I built a watchdog into the WebSocket server that checks if the web server (the website) is responsive and if not (= Apache has locked up) kills all WebSocket connections. This works for now, but I only consider this a workaround.

For completeness, here's the Apache config regarding the WebSocket connections (yes, I am using mod_proxy_http):

Code:

    ProxyPass /ws ws://127.0.0.1:8001 upgrade=websocket
    ProxyPassReverse /ws ws://127.0.0.1:8001


To be honest I'd rather not resort to using the outdated mod_proxy_wstunnel - that seems also like a workaround and I would prefer the watchdog workaround.

The thing is: All open WebSocket connections are always active, so the point is not how to handle the timeout of inactive connections. The clients send pings to the WebSocket server, so the connections are never stale. And if the pings fail, the connections are re-established.

I therefore need the server to "brute-force kill" those connections (which is fine for those connections) during a graceful reload. And I was under the impression that this is exactly what GracefulShutdownTimeout is meant to do:

From https://httpd.apache.org/docs/current/mod/mpm_common.html
Quote:

The GracefulShutdownTimeout specifies how many seconds after receiving a "graceful-stop" signal, a server should continue to run, handling the existing connections. Setting this value to zero means that the server will wait indefinitely until all remaining requests have been fully served.


So, once the server ceases to "continue to run", it - so I thought - would simply terminate, no matter the still active connections. But exactly that is not happening.


So my main concern is: How is the GracefulShutdownTimeout option actually supposed to work?
Back to top
James Blond
Moderator


Joined: 19 Jan 2006
Posts: 7431
Location: EU, Germany, Next to Hamburg

PostPosted: Wed 06 Aug '25 9:11    Post subject: Reply with quote

deepwell wrote:


So my main concern is: How is the GracefulShutdownTimeout option actually supposed to work?


It should work like you think, by ending the graceful connection after x seconds. But sometimes it doesn't.

An idea might be to have a watchdog script.

Code:

#!/bin/bash

# Configuration
MAX_GRACEFUL_WORKERS=10  # Threshold for stuck 'G' workers
APACHECTL="/usr/sbin/apachectl"
PS_CMD="/bin/ps"
GREP_CMD="/bin/grep"

# Count Apache workers in 'G' state
GRACEFUL_COUNT=$($PS_CMD -eo pid,ppid,state,cmd | $GREP_CMD -E 'apache2|httpd' | $GREP_CMD 'G' | wc -l)

echo "[$(date)] Graceful workers: $GRACEFUL_COUNT"

if [ "$GRACEFUL_COUNT" -ge "$MAX_GRACEFUL_WORKERS" ]; then
    echo "[$(date)] Too many graceful workers. Restarting Apache..."
    $APACHECTL -k restart
fi


run like every 5 Minutes via crontab

Code:

*/5 * * * * /usr/local/bin/apache-watchdog.sh >> /var/log/apache-watchdog.log 2>&1
Back to top
deepwell



Joined: 19 Jul 2025
Posts: 7
Location: AT, Vienna

PostPosted: Wed 06 Aug '25 11:29    Post subject: Reply with quote

James Blond wrote:

It should work like you think, by ending the graceful connection after x seconds. But sometimes it doesn't.


I see. Actually, in my experience it never does. In my (quite vanilla) setup it is 100% reproducible Sad


James Blond wrote:

An idea might be to have a watchdog script.


Thx! This way we could detect it even earlier than I am currently with the watchdog inside the WebSocket server.

Only:
Code:
ps -eo pid,ppid,state,cmd

for me does not show if the process is in "graceful shutdown" mode.

When I do I graceful reload while a WebSocket connection is active, I get this server status:

Code:
Slot   PID   Stopping   Connections   Threads   Async connections
total   accepting   busy   graceful   idle   writing   keep-alive   closing
0   772891   no   0   yes   1   0   127   0   0   0
1   706017   yes (old gen)   1   no   0   0   0   0   0   0
2   773075   no   0   yes   1   0   127   0   0   0
Sum   3   1   1       2   0   254   0   0   0
_________R______________________________________________________
________________________________________________________________
................................................................
.............................................................G..
__W_____________________________________________________________
________________________________________________________________
................................................................
...


So one worker is in "G" state and PID 706017 is in "stopping (old gen)" (meaning that none of its workers will accept new connections)

But ps just gives me
Code:
ps -eo pid,ppid,state,cmd | grep -E 'apache2|httpd'
 499256       1 S /usr/sbin/apache2 -k start
 706017  499256 S /usr/sbin/apache2 -k start
 772891  499256 S /usr/sbin/apache2 -k start
 773075  499256 S /usr/sbin/apache2 -k start


I am on Ubuntu 24.04

PS: Note that I am using event MPM...
Back to top
James Blond
Moderator


Joined: 19 Jan 2006
Posts: 7431
Location: EU, Germany, Next to Hamburg

PostPosted: Wed 06 Aug '25 12:57    Post subject: Reply with quote

Yes, the old_gen is the one with the G state.

Parsing the scoreboard is a good idea.

untested code. It should work.

Code:

#!/bin/bash

APACHECTL="/usr/sbin/apachectl"
GREP_CMD="/bin/grep"

# URL of the Apache server-status page (mod_status must be enabled)
STATUS_URL="http://localhost/server-status?auto"

# Optional: Add authentication if required
# STATUS_URL="http://user:pass@localhost/server-status?auto"

# Fetch the server status
STATUS=$(curl -s "$STATUS_URL")

# Extract the Scoreboard line
SCOREBOARD=$(echo "$STATUS" | $GREP_CMD "^Scoreboard:" | cut -d':' -f2)

# Count the number of 'G' (Graceful shutdown) processes
G_COUNT=$(echo "$SCOREBOARD" | $GREP_CMD -o "G" | wc -l)


# Output the result (optional)
# echo "Anzahl der Prozesse im Graceful Shutdown (G): $G_COUNT"

# Optional: Trigger an action if the number exceeds a threshold
THRESHOLD=5
if [ "$G_COUNT" -ge "$THRESHOLD" ]; then
    # optional warning for a log file
    # echo "Warning: $G_COUNT graceful shutdown processes exceed threshold of $THRESHOLD!"
    $APACHECTL -k restart
fi

Back to top
deepwell



Joined: 19 Jul 2025
Posts: 7
Location: AT, Vienna

PostPosted: Wed 06 Aug '25 14:10    Post subject: Reply with quote

Thx! Yes, I guess using the Apache scoreboard is the best way. We already have a system health check that looks at the scoreboard. I guess we'll add checking for 'G' states there - and switch to Nginx in the long run...Sad
Back to top


Reply to topic   Topic: GracefulShutdownTimeout setting is doing nothing View previous topic :: View next topic
Post new topic   Forum Index -> Apache