Pritunl Wireguard constant reconnects (urgent)

Joly0 · January 20, 2025, 9:06am

Hey, i have pritunl enterprise deployed in my company for currently about 40 people. For about 5 Months for some users they experience constant reconnects when using pritunl with wireguard. This issue increased after upgrading to the latest version v1.3.4099.99 last week and is now affecting several more users in my company. Connecting using wireguard works usually, but after some times (usually after 5-10 minutes) the connection is lost and then reconnected. We are unable to pin this down to any specific network or hardware problem, as it is affecting a variety of users with different laptops, networks and hardware configurations. Some are connected using wifi and some using ethernet.

Here are some service logs from various users:

Also if helpful the client logs:

zach · January 20, 2025, 10:23pm

Increase the WIreGuard Ping Timeout to 300 in the server settings.

Joly0 · January 21, 2025, 8:07am

The wireguard ping timeout is already configured to 360

zach · January 21, 2025, 9:26am

Check the server CPU usage, it’s likely the Pritunl web server can’t complete the ping requests possibly from an overloaded CPU.

Joly0 · January 21, 2025, 10:03am

Looking at htop on the server, i can see the mem usage is about 750M from assigned 4GB and the cpu usage is usually between 0.5-4% per core (4 cores assigned to the server). So all should be fine there.

Joly0 · January 24, 2025, 8:41am

@zach Any idea? This is a constant issue and it keeps occuring for more and more of my colleagues

zach · January 24, 2025, 2:00pm

Check the logs in the top right of the Pritunl server web console and check the output in sudo journalctl -u pritunl -n 5000 for errors. Using the systemd isolated web server may correct the issue or at least isolate errors. This can be enabled with sudo pritunl set app.web_systemd true and sudo systemctl restart pritunl. Then run sudo systemctl status pritunl-web and sudo journalctl -u pritunl-web -n 5000

It’s more likely just connection issues with the underlying WireGuard connection which will get detected by the ping updates that are showing errors.

Joly0 · January 24, 2025, 3:31pm

I have looked through the sudo journalctl -u pritunl-web -n 5000 log but couldnt find anything. When a user disconnects, pritunl just tries to reconnect with the user and the log outputs “Authenticating user” without any additional information thats relevant to this issue. Its just the normal authentication.
I will later try to change the process to systemd as recommended and will try to check if the issue has been solved or not.

Joly0 · January 24, 2025, 5:13pm

@zach I changed to the systemd webserver, but that didnt solve the issue and there are no logs found. ts still the same just showing “Authenticating user” various users including me and when i have a disconnect, it just shows again “Authenticating user”, nothing more.

zach · January 26, 2025, 5:56pm

It’s likely the WireGuard connection is unstable. On Windows check the status of it from the WireGuard client in the taskbar.

Joly0 · January 27, 2025, 8:58am

Hey @zach so i watched at the wireguard log on the client device itself and noticed nothing odd. When the connection drops, i can see the following in the log:

2025-01-27 09:49:25.016274: [TUN] [pritunl0] Sending handshake initiation to peer 1 (XXX.XXX.XXX.XXX:51820)
2025-01-27 09:49:25.038222: [TUN] [pritunl0] Receiving handshake response from peer 1 (XXX.XXX.XXX.XXX:51820)
2025-01-27 09:49:25.038222: [TUN] [pritunl0] Keypair 21 destroyed for peer 1
2025-01-27 09:49:25.038222: [TUN] [pritunl0] Keypair 23 created for peer 1
2025-01-27 09:49:25.038222: [TUN] [pritunl0] Sending keepalive packet to peer 1 (XXX.XXX.XXX.XXX:51820)
2025-01-27 09:51:25.371868: [TUN] [pritunl0] Sending handshake initiation to peer 1 (XXX.XXX.XXX.XXX:51820)
2025-01-27 09:51:25.384751: [TUN] [pritunl0] Receiving handshake response from peer 1 (XXX.XXX.XXX.XXX:51820)
2025-01-27 09:51:25.384751: [TUN] [pritunl0] Keypair 22 destroyed for peer 1
2025-01-27 09:51:25.384751: [TUN] [pritunl0] Keypair 24 created for peer 1
2025-01-27 09:51:25.384751: [TUN] [pritunl0] Sending keepalive packet to peer 1 (XXX.XXX.XXX.XXX:51820)
2025-01-27 09:53:25.437653: [TUN] [pritunl0] Sending handshake initiation to peer 1 (XXX.XXX.XXX.XXX:51820)
2025-01-27 09:53:25.455457: [TUN] [pritunl0] Receiving handshake response from peer 1 (XXX.XXX.XXX.XXX:51820)
2025-01-27 09:53:25.455457: [TUN] [pritunl0] Keypair 23 destroyed for peer 1
2025-01-27 09:53:25.455457: [TUN] [pritunl0] Keypair 25 created for peer 1
2025-01-27 09:53:25.455457: [TUN] [pritunl0] Sending keepalive packet to peer 1 (XXX.XXX.XXX.XXX:51820)
2025-01-27 09:53:49.119162: [TUN] [pritunl0] Shutting down
2025-01-27 09:54:00.259586: [TUN] [pritunl0] Starting WireGuard/0.5.3 (Windows 10.0.22631; amd64)
2025-01-27 09:54:00.259586: [TUN] [pritunl0] Watching network interfaces
2025-01-27 09:54:00.261691: [TUN] [pritunl0] Resolving DNS names
2025-01-27 09:54:00.261691: [TUN] [pritunl0] Creating network adapter
2025-01-27 09:54:00.447264: [TUN] [pritunl0] Using existing driver 0.10
2025-01-27 09:54:00.459102: [TUN] [pritunl0] Creating adapter
2025-01-27 09:54:00.727916: [TUN] [pritunl0] Using WireGuardNT/0.10
2025-01-27 09:54:00.727916: [TUN] [pritunl0] Enabling firewall rules
2025-01-27 09:54:00.657985: [TUN] [pritunl0] Interface created
2025-01-27 09:54:00.732166: [TUN] [pritunl0] Dropping privileges
2025-01-27 09:54:00.732698: [TUN] [pritunl0] Setting interface configuration
2025-01-27 09:54:00.733228: [TUN] [pritunl0] Peer 1 created
2025-01-27 09:54:00.735441: [TUN] [pritunl0] Monitoring MTU of default v4 routes
2025-01-27 09:54:00.735441: [TUN] [pritunl0] Interface up
2025-01-27 09:54:00.741405: [TUN] [pritunl0] Setting device v4 addresses
2025-01-27 09:54:00.766755: [TUN] [pritunl0] Monitoring MTU of default v6 routes
2025-01-27 09:54:00.767754: [TUN] [pritunl0] Setting device v6 addresses
2025-01-27 09:54:00.814688: [TUN] [pritunl0] Startup complete
2025-01-27 09:54:00.841215: [TUN] [pritunl0] Sending handshake initiation to peer 1 (XXX.XXX.XXX.XXX:51820)
2025-01-27 09:54:00.880564: [TUN] [pritunl0] Receiving handshake response from peer 1 (XXX.XXX.XXX.XXX:51820)
2025-01-27 09:54:00.880564: [TUN] [pritunl0] Keypair 1 created for peer 1
2025-01-27 09:54:00.908829: [TUN] [pritunl0] Receiving keepalive packet from peer 1 (XXX.XXX.XXX.XXX:51820)
2025-01-27 09:54:31.942448: [TUN] [pritunl0] Sending keepalive packet to peer 1 (XXX.XXX.XXX.XXX:51820)

So the last keepalive packet was send at “09:53:25” and the connection dropped at “09:53:49” and reconnected afterwards. So nothing odd at the client side in my opinion. But this issue needs to be resolved as it is affecting more and more users. I am not sure where this problem comes from. We already tried rolling back to the previous version of the client, but that didnt solve the issue. We suspect if the current server version might be a problem here?

zach · January 27, 2025, 9:14am

Try running curl http://10.243.2.1/check or a similar Windows command and then ping 10.243.2.1. If the curl request fails but the ping doesn’t it will indicate there’s an issue with the web server. If both commands fail the connection isn’t working.

Joly0 · January 27, 2025, 9:28am

Hey @zach, while connected to the vpn, both commands work without issues. The first one returns “OK” and the ping command returns normal ping results

zach · January 27, 2025, 9:48am

The commands need to be tested as soon as the issue occurs and before the timeout removes the connection.

Joly0 · January 27, 2025, 10:35am

Ok, i tried to run both command as soon as the connection dropped and the curl command fails this time, while the ping command still works

Joly0 · January 27, 2025, 11:06am

@zach I tried running both commands at the same time now every second, this is the result when loosing connection:

OK[2025-01-27 12:02:34.675] Running command...
OK[2025-01-27 12:02:35.727] Running command...
OK[2025-01-27 12:02:36.781] Running command...
curl: (28) Failed to connect to 10.243.2.1 port 80 after 21004 ms: Could not connect to server
[2025-01-27 12:02:58.824] Running command...
OK[2025-01-27 12:02:59.870] Running command...
OK[2025-01-27 12:03:00.936] Running command...

Antwort von 10.243.2.1: Bytes=32 Zeit=21ms TTL=64
Antwort von 10.243.2.1: Bytes=32 Zeit=12ms TTL=64
Antwort von 10.243.2.1: Bytes=32 Zeit=11ms TTL=64
Antwort von 10.243.2.1: Bytes=32 Zeit=9ms TTL=64
Antwort von 10.243.2.1: Bytes=32 Zeit=10ms TTL=64
Zeitüberschreitung der Anforderung.
Antwort von XX.XXX.XXX.XX: Zielnetz nicht erreichbar.
Antwort von XX.XXX.XXX.XX: Zielnetz nicht erreichbar.
Antwort von XX.XXX.XXX.XX: Zielnetz nicht erreichbar.
Antwort von XX.XXX.XXX.XX: Zielnetz nicht erreichbar.
Zeitüberschreitung der Anforderung.
Antwort von 10.243.2.1: Bytes=32 Zeit=10ms TTL=64
Antwort von 10.243.2.1: Bytes=32 Zeit=10ms TTL=64
Antwort von 10.243.2.1: Bytes=32 Zeit=18ms TTL=64

So the first command runs the curl command while the second runs the ping. Both every second. When the connection dropps the curl command fails to connect to the webserver while the ping does stop for a short period of time and then continues.

zach · January 27, 2025, 1:52pm

It’s likely the connection being disrupted instead of any issue with the software. Try using a different server or datacenter location.

Joly0 · January 27, 2025, 2:06pm

@zach Pritunl has been working in this setup for over 2 years now without any issues. This started just recently and nothing changed on our end. So i do not think, that this is an issue with our infrastructure. This issue especially started on most of the users after the recent update of the pritunl client, though for some (just a handful of users) this issue has been an issue for about 4-6 months. Also connection using openvpn through pritunl works without any issues, its just the wireguard connection thats having this issue. We could keep using openvpn, but a lot of users have way slower bandwidth with openvpn compared to wireguard and they rely on a working and fast connection.

zach · January 27, 2025, 3:35pm

You can test if it is an issue with the rewritten connection management code by downloading the v1.3.4026.10 release that was before that change. But it isn’t an error I think would be related and I haven’t seen other reports of this issue.

Joly0 · January 30, 2025, 3:59pm

Hey @zach we downgraded a few key users to the mentioned version of the pritunl client and so far, we had no problems anymore with the wireguard connection. For my colleagues and me we had consistent connections for basically a whole working day. Usually the wireguard connection would drop atleast 20-30 times in such a long time frame, but so far, everything seems stable.
So i assume there has to be some kind of issue with the newer pritunl client versions regarding the connection management