Pritunl: PrimarySteppedDown / Not primary while writing errors in bandwidth collection after MongoDB election

Hello! I need some help with intermittent MongoDB write errors in Pritunl.

Context

  • Event: After a scale/maintenance on MongoDB Atlas (3-node replica set), we started seeing intermittent errors when writing bandwidth metrics. VPN functionality remains OK.

Environment

  • Pritunl: v1.32.4278.46

  • OS/Host: Amazon Linux 2 and 2023

  • MongoDB: Atlas, replica set (3 nodes)

  • Driver: PyMongo (bundled with Pritunl)

  • Connection string (sanitized):

    mongodb+srv://<user>:<pass>@<cluster>/<db>?retryWrites=true&w=majority&readPreference=primary
    

Log (sanitized)

[host-**02**][2025-08-19 14:33:34,051][ERROR] Error in management rate thread
Traceback (most recent call last):
  File "/usr/lib/pritunl/usr/lib/python3.9/site-packages/pritunl/server/instance_com.py", line 322, in _watch_thread
    self.server.bandwidth.add_data(
  File "/usr/lib/pritunl/usr/lib/python3.9/site-packages/pritunl/server/bandwidth.py", line 80, in add_data
    self.collection.bulk_write(bulk)
  ...
pymongo.errors.BulkWriteError: batch op errors occurred, full error: {
  'writeErrors': [
    {
      'index': 5,
      'code': 189,
      'errmsg': 'Not primary while writing to ***.servers_bandwidth',
      'op': {
        'q': {
          'server_id': ObjectId('65ce0e71…f8626'),
          'period': '30m',
          'timestamp': {'$lt': datetime.datetime(2025, 8, 12, 14, 30)}
        },
        'limit': 0
      }
    }
  ],
  'writeConcernErrors': [
    {
      'code': 189,
      'codeName': 'PrimarySteppedDown',
      'errmsg': 'Primary stepped down while waiting for replication',
      'errInfo': {'writeConcern': {'w': 'majority', 'wtimeout': 0, 'provenance': 'clientSupplied'}}
    }
  ],
  'nInserted': 0, 'nUpserted': 0, 'nMatched': 3, 'nModified': 3, 'nRemoved': 0, 'upserted': []
}
  server_id   = "65ce0e71…f8626"
  instance_id = "68897ad4…7a020"

What we verified

  • Atlas Events near the same time show election/Primary stepped down.

  • After the election, the replica set stabilized (normal replication lag).

  • The failure is intermittent and seems scoped to the servers_bandwidth collection (telemetry).

Questions

  1. Are Not primary / PrimarySteppedDown errors during elections expected/benign for the bandwidth collection routine?

  2. Is there an official way in Pritunl to either:

    • relax write concern or reduce the sampling frequency for servers_bandwidth, or

    • disable bandwidth collection to avoid noise during maintenance windows?

  3. Any recommended URI parameters for Pritunl (e.g., retryReads=true, wtimeoutMS=10000, appName=pritunl-vpn) to minimize visible errors across elections?

  4. Regarding read preference: is it safe to use readPreference=secondaryPreferred (optionally with maxStalenessSeconds) with Pritunl, or are there components that require strong consistency (primary reads) and might show stale data if we switch?