Error
Error Code:
189
MongoDB Error 189: Primary Stepped Down
Description
Error 189, 'Primary Stepped Down', indicates that the current primary node in a MongoDB replica set has voluntarily or involuntarily transitioned to a secondary role. This event triggers a new election among the remaining members to choose a new primary. While often a normal part of replica set failover, frequent occurrences can signal underlying network, resource, or configuration issues.
Error Message
Primary Stepped Down
Known Causes
4 known causesNetwork Connectivity Loss
The primary node lost network connectivity to a majority of the replica set members, causing it to step down.
Resource Contention or Overload
The primary experienced high CPU, memory, or disk I/O, leading it to become unresponsive or unhealthy and step down.
Manual Primary Step Down
An administrator explicitly issued a rs.stepDown() command for planned maintenance, upgrades, or configuration changes.
Configuration or Replication Issues
Changes in replica set configuration or significant replication lag on the primary might trigger a step-down event.
Solutions
4 solutions available1. Investigate and Resolve Network Connectivity Issues medium
Address underlying network problems that are causing the primary to step down.
1
Check network connectivity between replica set members. Ensure there are no firewalls blocking communication or intermittent network failures.
ping <replica_set_member_ip>
2
Verify that all replica set members can resolve each other's hostnames correctly. Use `nslookup` or `dig` to test.
nslookup <replica_set_member_hostname>
3
Monitor network latency and packet loss between replica set members. High latency or packet loss can lead to elections.
mtr <replica_set_member_ip>
4
Review system logs on the affected server(s) for any network-related error messages.
sudo tail -f /var/log/syslog # Or equivalent log file for your OS
2. Analyze and Address Resource Constraints medium
Identify and resolve resource limitations (CPU, RAM, Disk I/O) on the primary that might be causing it to become unresponsive.
1
Monitor CPU usage on the primary server. High CPU utilization can make the server unresponsive, leading to timeouts and elections.
top
2
Check available RAM on the primary server. Insufficient memory can lead to excessive swapping, degrading performance.
free -h
3
Monitor disk I/O performance on the primary. Slow disk operations can significantly impact MongoDB's ability to respond.
iostat -xz 1
4
Review MongoDB logs for any messages indicating resource pressure, such as slow queries or write stalls.
sudo tail -f /var/log/mongodb/mongod.log
5
Consider scaling up the server resources (CPU, RAM) or optimizing application queries and indexing if resource constraints are identified.
N/A
3. Review and Adjust Replica Set Configuration medium
Examine and modify replica set settings to prevent unnecessary elections.
1
Connect to your MongoDB replica set using the `mongosh` shell.
mongosh
2
Access the `rs.conf()` to view the current replica set configuration.
rs.conf()
3
Check the `settings.electionTimeoutMillis` value. If it's too low, consider increasing it to allow more time for the primary to respond before an election is triggered. The default is 10000 milliseconds (10 seconds).
db.adminCommand({ replSetGetConfig: 1 }).settings.electionTimeoutMillis
4
If you need to change the election timeout, use `rs.reconfig()` with the updated configuration. *Caution: Modifying configuration requires careful consideration and understanding of replica set dynamics.*
var cfg = rs.conf();
cfg.settings.electionTimeoutMillis = 20000; // Example: Increase to 20 seconds
rs.reconfig(cfg);
5
Ensure the `members` array in `rs.conf()` correctly lists all replica set members with their respective hostnames and ports.
N/A
4. Perform a Graceful Primary Re-election easy
Manually trigger an election to shift the primary role to a different member.
1
Connect to the current primary node using the `mongosh` shell.
mongosh
2
Initiate a primary step-down. This command forces the current primary to step down and trigger an election.
rs.stepDown()
3
Monitor the replica set status (`rs.status()`) to confirm a new primary has been elected and the replica set is healthy.
rs.status()