Pritunl: PrimarySteppedDown / Not primary while writing errors in bandwidth collection after MongoDB election

Latorre · August 20, 2025, 6:54pm

Hello! I need some help with intermittent MongoDB write errors in Pritunl.

Context

Event: After a scale/maintenance on MongoDB Atlas (3-node replica set), we started seeing intermittent errors when writing bandwidth metrics. VPN functionality remains OK.

Environment

Pritunl: v1.32.4278.46
OS/Host: Amazon Linux 2 and 2023
MongoDB: Atlas, replica set (3 nodes)
Driver: PyMongo (bundled with Pritunl)

Connection string (sanitized):

mongodb+srv://<user>:<pass>@<cluster>/<db>?retryWrites=true&w=majority&readPreference=primary

Log (sanitized)

[host-**02**][2025-08-19 14:33:34,051][ERROR] Error in management rate thread
Traceback (most recent call last):
  File "/usr/lib/pritunl/usr/lib/python3.9/site-packages/pritunl/server/instance_com.py", line 322, in _watch_thread
    self.server.bandwidth.add_data(
  File "/usr/lib/pritunl/usr/lib/python3.9/site-packages/pritunl/server/bandwidth.py", line 80, in add_data
    self.collection.bulk_write(bulk)
  ...
pymongo.errors.BulkWriteError: batch op errors occurred, full error: {
  'writeErrors': [
    {
      'index': 5,
      'code': 189,
      'errmsg': 'Not primary while writing to ***.servers_bandwidth',
      'op': {
        'q': {
          'server_id': ObjectId('65ce0e71…f8626'),
          'period': '30m',
          'timestamp': {'$lt': datetime.datetime(2025, 8, 12, 14, 30)}
        },
        'limit': 0
      }
    }
  ],
  'writeConcernErrors': [
    {
      'code': 189,
      'codeName': 'PrimarySteppedDown',
      'errmsg': 'Primary stepped down while waiting for replication',
      'errInfo': {'writeConcern': {'w': 'majority', 'wtimeout': 0, 'provenance': 'clientSupplied'}}
    }
  ],
  'nInserted': 0, 'nUpserted': 0, 'nMatched': 3, 'nModified': 3, 'nRemoved': 0, 'upserted': []
}
  server_id   = "65ce0e71…f8626"
  instance_id = "68897ad4…7a020"

What we verified

Atlas Events near the same time show election/Primary stepped down.
After the election, the replica set stabilized (normal replication lag).
The failure is intermittent and seems scoped to the servers_bandwidth collection (telemetry).

Questions

Are Not primary / PrimarySteppedDown errors during elections expected/benign for the bandwidth collection routine?
Is there an official way in Pritunl to either:
- relax write concern or reduce the sampling frequency for servers_bandwidth, or
- disable bandwidth collection to avoid noise during maintenance windows?
Any recommended URI parameters for Pritunl (e.g., retryReads=true, wtimeoutMS=10000, appName=pritunl-vpn) to minimize visible errors across elections?
Regarding read preference: is it safe to use readPreference=secondaryPreferred (optionally with maxStalenessSeconds) with Pritunl, or are there components that require strong consistency (primary reads) and might show stale data if we switch?