Pritunl server not accessible through VPN with more than 1 connection

wkjung · April 20, 2022, 10:07am

Hi, first of all many thanks to your great programs.

We have been running a single pritunl server (now v1.30.3116.68) on ubuntu 20.04.3 machine, where dozens of clients can connect, with multiple device feature enabled.

The pritunl server suddenly denied all multiple-device connection of any user, including newly-created one, making an error "unable to assign ip address, pool full". When there is no connection from a user (so the web UI says offline) it accepted one connection but not more.

This is the snippet of the log, which denies the second connection of a user:

[winter-thunder-9503][2022-04-20 14:23:27,699][ERROR] Unable to assign ip address, pool full
  server_id     = "6225cbdbbfbdafd4c3915****"
  instance_id   = "6231a364b965eac1ab09****"
  user_id       = "62341bf5b965eac1ab0a***"
  multi_device  = true
  replica_count = 1
  network       = "10.8.0.0/24"
  user_count    = 22

There seems to be no mongodb issue (from the mongod.service and mongodb log I can tell).

After I re-started the pritunl server via web UI, the symptom was gone and started to work well.
What can be the source of the problem and how should I avoid this?

zach · April 20, 2022, 8:26pm

This issue should be fixed in a recent v1.30 update. If you had a previous version when the issue occurred it should be fixed after restarting the server. Updating the package requires running sudo systemctl restart pritunl to load the latest version.

rsemiraglio-ssc · October 13, 2022, 9:45pm

We’re running on 1.30.3292.22-0ubuntu1~focal and we get the unable to assign ip address, pool full error a few hours after we disable the Allow Multiple Devices option on the server advanced config screen. If we re-enable Allow Multiple Devices, no problems for anyone to connect.

zach · October 13, 2022, 11:29pm

How many devices are connecting to the server? Each user that is attached to the server is assigned a static IP even when not connected. This will reduce the available IP addresses for additional devices.

rsemiraglio-ssc · October 14, 2022, 1:34pm

When this last happened we were in the 70-80 range of connected devices. The pool has 253 IPs in it.

pascal-hofmann · November 24, 2022, 2:18pm

We had issues with “Unable to assign ip address, pool full” when a user tried to connect with multiple devices after even after extending the network range of a server.

I checked the database and noticed that pool_cursor was set to the start of the network range, even though there were a lot of unused IP addresses in the network.

According to the source code it is never reset, unless you restart the server.

In order to avoid a restart of the server, I fixed this by running:

db.servers.update({'_id': ObjectId("SERVERID")}, {'$set': {'pool_cursor': null }});

zach · November 24, 2022, 7:20pm

All users that are in an organization attached to the server are assigned a static IP address starting from the beginning of the subnet. When multiple devices are enabled additional user devices are assigned a temporary IP address starting from the end of the subnet. There was an issue with this in versions before v1.30.3331.78 that had left behind an expire index in the clients_pool database collection. The design was changed to clear the user ID from the IP document to make it available for reuse, this index was removing those documents. To fix this all the hosts must be updated then all the servers must be stopped and started.

AhTh8vow · March 2, 2023, 12:23am

We (the same instance as in @pascal-hofmann post) experienced the issue again. It maxed out at the /22 networks broadcast address while only having a few dozens users connected.
It seems the pool_cursor is never reset / cleaned up during runtime. Did it manually again.

jpasher-work · April 10, 2023, 4:44pm

I can confirm encountering the same problem on Ubuntu 22.04 running 1.30.3354.99-0ubuntu1~jammy. We have a /26 with 46 assigned users and 32 online (31 unique), so there should be dynamic IPs still available. Running the aforementioned MongoDB command seemed to fix the problem without a restart.

zach · April 11, 2023, 1:22am

Clearing the cursor shouldn’t be required after restarting the servers with the latest update. I have added code to clear the cursor on startup for the upcoming v1.32 release. Clearing the cursor won’t result in the database being inconsistent. When the server assigns an IP address to a client without a cursor it will iterate from the start of the pool, this could cause a slight delay if there are already a significant number of the addresses in use. The cursor can safely be cleared at any time without restarting servers.

jpasher-work · April 11, 2023, 3:10pm

Without being familiar with the code, it sounds like the cursor is intended to essentially point to the next available IP address in a list, and when it reaches the end, it assumes none are available. When people are assigned dynamic IPs, it leaves gaps in the list once they are released, since the cursor does not go backwards. The “clear cursor on startup” change you mention performs that reset so it can reuse those gaps.

Assuming this is correct, what about making it automatically reset the cursor the first time it thinks the IP pool is exhausted? It would then go through the list from the beginning and find any gaps. Worst case, it goes all the way through without finding a free IP, then gives an error. This could even be an option for those that don’t want to encounter that delay or overhead, although a delay seems like a better alternative to requiring a server restart or manual cursor reset.

zach · April 11, 2023, 7:33pm

Once a client disconnects the IP address remains in the database, the user and connection ID that was using the address is cleared from the IP document. On future connections the server first attempts to find and update an existing IP that contains null user fields. If none are found additional IP addresses are added starting at the cursor. The problem occurred when a TTL expiring index was left behind from a previous design. This caused the IP documents to be removed by MongoDB after the client disconnected and the timestamp was not updated. Those removed IP addresses would be lost and eventually no dynamic addresses would be available. The TTL index should be removed after the recent updates allowing the system to function.

The servers remain in the running state even when the Pritunl service is restarted. The only way to restore the lost addresses is to stop and start the servers from the web console which will cause the entire pool to clear. Clearing the cursor is an alternative way to recover the lost addresses without a server restart but has the effect of increasing load on the database. With a large network it could cause thousands of database queries for the initial recovery.

nickvn06 · June 28, 2023, 5:26am

After reading through the forum, I can also confirm the issue of “Unable to Assign IPs” due to IP Address exhaustion. The solution was to manually reset the server from the web console which cleared up the entire pool. Can anyone confirm if an automated solution or source code fix has been implemented in later releases? We have a running instance on Debian 10, with around 30 users and “Allow Multiple Hosts” enabled.

AhTh8vow · September 29, 2023, 11:55am

We still face the issue that clients with multiple device can’t connect after some time.
Our workaround is to reset the cursor in MongoDB; thinking about to do this via a cron job.

pascal-hofmann · September 29, 2023, 7:19pm

@zach This still is an issue, even with the latest releases. Is there a plan to finally tackle this problem or should we just setup a cron job like @AhTh8vow mentioned?

zach · October 3, 2023, 1:15am

I believe there could be an issue with how indexes are getting updating causing the expiration index to remain. Running db.clients_pool.getIndexes() should not show any indexes with expireAfterSeconds. Check the collection for this index.

pascal-hofmann · October 5, 2023, 5:47am

Hi zach,
these are the indexes:

> db.clients_pool.getIndexes()
[
	{
		"v" : 2,
		"key" : {
			"_id" : 1
		},
		"name" : "_id_",
		"ns" : "pritunl.clients_pool"
	},
	{
		"v" : 2,
		"key" : {
			"client_id" : 1
		},
		"name" : "client_id_1",
		"ns" : "pritunl.clients_pool",
		"background" : true
	},
	{
		"v" : 2,
		"key" : {
			"server_id" : 1,
			"user_id" : 1
		},
		"name" : "server_id_1_user_id_1",
		"ns" : "pritunl.clients_pool",
		"background" : true
	},
	{
		"v" : 2,
		"key" : {
			"timestamp" : 1
		},
		"name" : "timestamp_1",
		"ns" : "pritunl.clients_pool",
		"background" : true
	}
]

TTTTTt · October 16, 2024, 10:54pm

We encountered exactly the same problem in version v1.30.3333.72

jfcmartins · January 6, 2025, 12:46pm

Hi @zach, any updates on this issue? I’m currently running version v1.32.3805.95 and have encountered the same problem with the pool_cursor. Restarting the server resets the pool_cursor to null, which temporarily resolves the issue, but it seems likely to recur. Is there a more permanent solution or any progress on addressing this?

zach · January 6, 2025, 9:56pm

I haven’t been able to reproduce the issue. I’ve reviewed the code several times and can’t find any issues. I think a lot of the problems are from under sized virtual networks that are not accounting for the hidden users. There is one user per server used to allocate a server certificate and 6 users pooled users per organization. These pre-created users are allocated to allow faster single sign-on logins. If there is already a very limited space for dynamic multi-device addresses it only takes a few fast reconnections to consume the remaining addresses before the previous ones are made available.

There are differences in how OpenVPN and WireGuard connections are closed and when the address is made available. For WireGuard connections it can be 10+ minutes. This has been reduced in newer releases and it is now 2 minutes in the latest release. If you are having the issue look at how many OpenVPN vs WireGuard connections there are.