logo
Apache Lounge
Webmasters

 

About Forum Index Downloads Search Register Log in RSS X


Keep Server Online

If you find the Apache Lounge, the downloads and overall help useful, please express your satisfaction with a donation.

or

Bitcoin

A donation makes a contribution towards the costs, the time and effort that's going in this site and building.

Thank You! Steffen

Your donations will help to keep this site alive and well, and continuing building binaries. Apache Lounge is not sponsored.
Post new topic   Forum Index -> Apache View previous topic :: View next topic
Reply to topic   Topic: Websockets: Server restarts eventually freeze whole server
Author
patchy



Joined: 10 Dec 2020
Posts: 2
Location: Germany

PostPosted: Fri 18 Dec '20 18:35    Post subject: Websockets: Server restarts eventually freeze whole server Reply with quote

Hi,

Short version:

I use httpd on Windows as a reverse proxy for a microservice system. Some services communicate over websockets (more precicely: SignalR). From time to time I have to restart the server in order to read a new configuration. I observe an increasing number of threads blocked by the SignalR connections. It's a matter of time until the server completely freezes because no threads are available for other requests.

Details:

I reduced my system as much as possible. I end up with two microservices, A and B. A has a SignalR hub. Both, A and B subscribe to the events of this hub. Thus, there should be two connections.

Now the experiment:

1. Start the two microservices: They repeatedly try to connect, but fail. This is expected, because they are configured to connect via the reverse proxy and httpd is not running yet.
2. Start httpd (Windows Service): As expected, both services establish their connection, confirmed by the service logs and mod_status showing 2 connections.
3. Restart httpd: In real-world, I call
httpd.exe -n "ServiceName" -k restart
programmatically. For this experiment, I call it from Powershell. What happens?
3a. The parent starts a new child and hands over 2 sockets, see error.log on Pastebin (link below)
3b. The parent needs to stop the old child. The old child cannot stop because of the open connections. The old child waits a grace period of 30s before, then it terminates the 2 threads. My services log that their connection was disconnected and attempt to reconnect. At this moment, 2 more connections appear in mod_status. However, I don't see any socket handover in error.log.
4. Repeat httpd restart.
4a. The parent starts a new child and hands over 2 sockets, see error.log. It's still 2 sockets, although I saw 4 connections in mod_status in the previous step.
4b. The parent shuts down the old child. This time, there is no grace period, but 18(!) threads that failed to exit are terminated, see error.log. Both services log disconnect and reconnect. However, no additional connections appear in mod_stats, it remains 4.

When I repeat restarting httpd, most of the time it happens the same as described in step 4. Only difference is a changing number of "threads that failed to exit". But sometimes, additional connections appear in mod_status. I can't reproduce this on purpose. I suspect a race condition how fast the old child is shut down, the new one is started and my services trying to reconnect, but I don't know the httpd source code.


To get my job done, I need to know: What can I do to avoid eventually blocking the server?
Out of curiosity, I also would like to know what excatly happens, how the SignalR connectios are handed over to the next child, why the first restart works different than the other restarts, ...

I appreciate any hint, also if it is just about further investigations!


Additional information

Version: 2.4.41
Some config snippets:

Code:
ThreadsPerChild 20 # handy for debugging, not in production

Code:
RewriteEngine On
RewriteCond %{HTTP:Upgrade} websocket [NC]
RewriteCond %{HTTP:Connection} upgrade [NC]
RewriteRule "^/my/microservice" "wss://hostname:53728%{REQUEST_URI}"[P]
ProxyPass /my/microservice https://hostname:53728/my/microservice
ProxyPassReverse /my/microservice https://hostname:53728/my/microservice


Link to error.log on Pastebin: https://pastebin.com/7a7B0bLb

Disclaimer: I posted this on the Apache users mailing list. Since there is no answer since one week, I dare to double post here.
Back to top
tangent
Moderator


Joined: 16 Aug 2020
Posts: 343
Location: UK

PostPosted: Sun 20 Dec '20 21:59    Post subject: Reply with quote

A knatty problem indeed, caused no doubt by the fact websockets are stateful.

To me, if you perform a graceful restart of Apache, to reload the configuration for future connections, those updates are not going to be applied to any threads owned by existing child processes. Connections maybe, if the sockets can be duplicated and handed off to threads in a successor child.

So even though a child has been instructed to exit, any free child threads will block until the timeout period has elapsed on any existing child connections.

Your log file initially shows this process, but equally subsequent analysis of the socket handover problem does rather suggest a race condition as you describe, and the unused child threads end up getting orphaned.

However, I note your configuration code snippet shows a mixture of mod_rewrite and specific mod_proxy directives, and if you read the RewriteRule documentation over the proxy [p] flag, https://httpd.apache.org/docs/current/rewrite/flags.html#flag_p, it does say:
    Performance warning
    Using this flag triggers the use of mod_proxy, without handling of persistent connections. This means the performance of your proxy will be better if you set it up with ProxyPass or ProxyPassMatch

    This is because this flag triggers the use of the default worker, which does not handle connection pooling/reuse.

    Avoid using this flag and prefer those directives, whenever you can.

This states persistent connections are not handled, so for me, despite the default documentation and other web posts showing the use of mod_rewrite to proxy, I'd try removing the mod_rewrite proxy logic, and try and stick with using mod_proxy/mod_proxy_wstunnel directives for websockets, viz:

Code:
ProxyRequests Off
ProxyPreserveHost on
ProxyPass /my/microservice wss://hostname:53728/my/microservice
ProxyPassReverse /my/microservice wss://hostname:53728/my/microservice

Not sure if this helps.
Back to top
patchy



Joined: 10 Dec 2020
Posts: 2
Location: Germany

PostPosted: Mon 21 Dec '20 20:02    Post subject: Reply with quote

The hint about the [P] flag is interesting indeed.

I tried your config snippet, but I had to modify it a bit in order to connect at all. I guess this is because of connection negotiation: When establishing the connection, the client does not know whether the server speaks websockets and starts with plain http. Using the upgrade header, server and client agree on switching protocols to websockets (at least this is the case for SignalR). So I must proxy http and websockets.

I separated proxying for negotiation and actual websocket connection:

Code:
ProxyRequests Off
ProxyPreserveHost on

ProxyPass /ChatHub/negotiate http://localhost:53353/ChatHub/negotiate
ProxyPassReverse /ChatHub/negotiate http://localhost:53353/ChatHub/negotiate

ProxyPass /ChatHub ws://localhost:53353/ChatHub
ProxyPassReverse /ChatHub ws://localhost:53353/ChatHub

Still, the behaviour does not change - even without the [P] flag.

Note: In the meanwhile I tried it with the newest server version and changed from my microservices to the Microsoft chat example ( https://github.com/dotnet/AspNetCore.Docs/tree/master/aspnetcore/signalr/dotnet-client/sample), so everyone can reproduce.
Back to top
tangent
Moderator


Joined: 16 Aug 2020
Posts: 343
Location: UK

PostPosted: Tue 22 Dec '20 23:25    Post subject: Reply with quote

Since you're using the WinNT MPM, have you tried disabling accept filters to see if that has any affect on the socket handling?

Code:
AcceptFilter http none
AcceptFilter https none
AcceptFilter ws none
AcceptFilter wss none
Back to top
James Blond
Moderator


Joined: 19 Jan 2006
Posts: 7348
Location: Germany, Next to Hamburg

PostPosted: Mon 28 Dec '20 23:59    Post subject: Reply with quote

From the docs of I see that you don't have to use web sockets.

Also did you read https://httpd.apache.org/docs/2.4/mod/mod_proxy_wstunnel.html

example
Code:

ProxyPass / http://example.com:9080/
RewriteEngine on
RewriteCond %{HTTP:Upgrade} websocket [NC]
RewriteCond %{HTTP:Connection} upgrade [NC]
RewriteRule ^/?(.*) "ws://example.com:9080/$1" [P,L]



Back to you problem 3b. Restarts of apache on Windows are always graceful. So the server waits until the connection is finished.

There are some bug reports about that

https://bz.apache.org/bugzilla/buglist.cgi?bug_status=__open__&component=mod_proxy_wstunnel&list_id=144532&order=changeddate%20DESC%2Cpriority%2Cbug_severity&product=Apache%20httpd-2&query_format=specific

Maybe it is a better idea to stop the service and start it instead of doing a graceful restart.
Back to top
patch2



Joined: 25 Jul 2023
Posts: 1

PostPosted: Tue 25 Jul '23 10:36    Post subject: Reply with quote

tangent wrote:
Since you're using the WinNT MPM, have you tried disabling accept filters to see if that has any affect on the socket handling?

Code:
AcceptFilter http none
AcceptFilter https none
AcceptFilter ws none
AcceptFilter wss none


Hello! I'm a colleague of patchy. Since then we tried several things to mitigate this issue and we now came to a conclusion. I wanted to report back so that the findings are not lost.

The approach here with the AcceptFilter settings was tested. It turned out that in a newer version of Apache, this version:

Server Version: Apache/2.4.57 (Win64) OpenSSL/3.1.1
Server MPM: WinNT
Apache Lounge VS17 Server built: May 31 2023 10:48:22

when these setings are active, It at first seems as if it solves the issue when restarting apache several times using 'httpd.exe -n "ServiceName" -k restart'.

But we did a repeated test and found out that the issue still arises, just after around 50-70 times repeated calls of 'httpd.exe -n "ServiceName" -k restart'. In between each test it was waited for 10 seconds.

After around 50-70 times the threads started with being stuck again and were not closed afterwards anymore.

James Blond wrote:
Maybe it is a better idea to stop the service and start it instead of doing a graceful restart.


We are considering this right now to solve the issue. Were introducing a check to find out the idle threads and if they are running low at some point, we will kill all apache processes and just restart it.

This will result in a connection-hickup but actually the soft restart also results in a connection-hickup always - so the situation is not getting worse by this.

We are also considering replacing apache alltogether because we are just using the reverse-proxy functionality of it until now and have no plans for using more features of it.
Back to top


Reply to topic   Topic: Websockets: Server restarts eventually freeze whole server View previous topic :: View next topic
Post new topic   Forum Index -> Apache