Pritunl: PrimarySteppedDown / Not primary while writing errors in bandwidth collection after MongoDB election

Hello! I need some help with intermittent MongoDB write errors in Pritunl.

Context

  • Event: After a scale/maintenance on MongoDB Atlas (3-node replica set), we started seeing intermittent errors when writing bandwidth metrics. VPN functionality remains OK.

Environment

  • Pritunl: v1.32.4278.46

  • OS/Host: Amazon Linux 2 and 2023

  • MongoDB: Atlas, replica set (3 nodes)

  • Driver: PyMongo (bundled with Pritunl)

  • Connection string (sanitized):

    mongodb+srv://<user>:<pass>@<cluster>/<db>?retryWrites=true&w=majority&readPreference=primary
    

Log (sanitized)

[host-**02**][2025-08-19 14:33:34,051][ERROR] Error in management rate thread
Traceback (most recent call last):
  File "/usr/lib/pritunl/usr/lib/python3.9/site-packages/pritunl/server/instance_com.py", line 322, in _watch_thread
    self.server.bandwidth.add_data(
  File "/usr/lib/pritunl/usr/lib/python3.9/site-packages/pritunl/server/bandwidth.py", line 80, in add_data
    self.collection.bulk_write(bulk)
  ...
pymongo.errors.BulkWriteError: batch op errors occurred, full error: {
  'writeErrors': [
    {
      'index': 5,
      'code': 189,
      'errmsg': 'Not primary while writing to ***.servers_bandwidth',
      'op': {
        'q': {
          'server_id': ObjectId('65ce0e71…f8626'),
          'period': '30m',
          'timestamp': {'$lt': datetime.datetime(2025, 8, 12, 14, 30)}
        },
        'limit': 0
      }
    }
  ],
  'writeConcernErrors': [
    {
      'code': 189,
      'codeName': 'PrimarySteppedDown',
      'errmsg': 'Primary stepped down while waiting for replication',
      'errInfo': {'writeConcern': {'w': 'majority', 'wtimeout': 0, 'provenance': 'clientSupplied'}}
    }
  ],
  'nInserted': 0, 'nUpserted': 0, 'nMatched': 3, 'nModified': 3, 'nRemoved': 0, 'upserted': []
}
  server_id   = "65ce0e71…f8626"
  instance_id = "68897ad4…7a020"

What we verified

  • Atlas Events near the same time show election/Primary stepped down.

  • After the election, the replica set stabilized (normal replication lag).

  • The failure is intermittent and seems scoped to the servers_bandwidth collection (telemetry).

Questions

  1. Are Not primary / PrimarySteppedDown errors during elections expected/benign for the bandwidth collection routine?

  2. Is there an official way in Pritunl to either:

    • relax write concern or reduce the sampling frequency for servers_bandwidth, or

    • disable bandwidth collection to avoid noise during maintenance windows?

  3. Any recommended URI parameters for Pritunl (e.g., retryReads=true, wtimeoutMS=10000, appName=pritunl-vpn) to minimize visible errors across elections?

  4. Regarding read preference: is it safe to use readPreference=secondaryPreferred (optionally with maxStalenessSeconds) with Pritunl, or are there components that require strong consistency (primary reads) and might show stale data if we switch?

Those errors shouldn’t occur but the server will be able to continue working with some database errors. If you are resizing the MongoDB Atlas replica it should be able to be done without losing a primary but it will depend on what resizing is done. The read preference should not be changed and it should override any read preference included in the URI. It will also override timeout settings. Generally you should not include any of those settings in the URI.

Good morning Zach,
Thank you for your response. Could you please help me understand why it’s not recommended to change the read preference or include these settings in the URI?

The default configuration is already optimized with the recommended options. Also using readPreference=secondaryPreferred effects the consistency requirements of the software. This would for example create the opportunity for a single use authorization to be used twice if it were written to the primary and read from the secondary before the secondary updates. It will also break the event system in cases where an event is processed before data relevant to the event is updated on the database.

I understand, thank you very much!