Error
Error Code:
63
MongoDB Error 63: Stale Shard Version
Description
This error indicates that a `mongos` router or `mongod` instance in a sharded cluster has outdated metadata about the cluster's shard configuration. It typically occurs when there's an inconsistency in the shard version information propagated from the config servers, leading to operational issues.
Error Message
Stale Shard Version
Known Causes
4 known causesConfiguration Server Latency
The config servers, which store the cluster metadata, are experiencing high latency or replication issues, preventing timely updates to other cluster members.
Network Partition or Lag
Network problems are preventing `mongos` routers or shard members from receiving the latest shard version information from the config servers.
Outdated mongos/mongod Instances
A specific `mongos` router or a `mongod` instance within a shard replica set is operating with an old version of the cluster metadata cache.
Manual Metadata Intervention
Incorrect or unauthorized manual manipulation of internal cluster metadata can lead to inconsistencies in shard versioning across the cluster.
Solutions
4 solutions available1. Restart the Affected Shard Replica Set Member easy
A simple restart can often resolve temporary inconsistencies in shard versioning.
1
Identify the shard replica set member that is reporting the stale version. You can usually see this in the MongoDB logs.
2
Gracefully shut down the MongoDB instance on that member. Replace `mongod_path` with the actual path to your mongod executable.
mongod_path --shutdown
3
Wait for the shutdown to complete, then restart the MongoDB instance.
mongod_path --config /path/to/mongod.conf
4
Monitor the logs of the restarted member and the config server to ensure the stale version error is no longer present.
2. Re-sync the Config Server's View of the Shard medium
Force the config server to re-evaluate the shard's status and version.
1
Connect to the primary of the config server replica set.
mongo --port <config_server_port> --host <config_server_host>
2
Run the `sh.status()` command to get an overview of the sharded cluster. Look for the specific shard that is reporting the stale version.
sh.status()
3
If the stale version persists, you may need to force a re-sync. This is a more advanced step and requires caution. First, try to identify if there are any pending operations or network issues affecting the config server's communication with the shard.
4
As a last resort, and with careful consideration of potential downtime, you might need to restart the config server replica set members one by one. Always ensure you have a backup of your config data before performing such operations.
3. Check for Network Connectivity Issues medium
Intermittent network problems can cause communication failures leading to stale shard versions.
1
Ensure that all members of the sharded cluster (mongod instances, mongos routers, and config servers) can communicate with each other without packet loss or high latency.
2
Use network diagnostic tools like `ping`, `traceroute`, or `mtr` from each component to the others. Pay close attention to any dropped packets or significant delays.
ping <other_server_ip>
3
Verify that firewalls are not blocking necessary MongoDB ports (default 27017 for mongod, 27019 for config servers, and the ports used by mongos instances).
4
If network issues are identified, work with your network administrators to resolve them. This might involve optimizing routes, increasing bandwidth, or reconfiguring firewalls.
4. Investigate and Resolve Config Server Inconsistencies advanced
Deep dive into config server health and state to find the root cause of version mismatches.
1
Connect to the primary of the config server replica set.
mongo --port <config_server_port> --host <config_server_host>
2
Check the health of the config server replica set by running `rs.status()`.
rs.status()
3
Examine the logs of the config server members for any errors related to replication, elections, or communication with shards.
4
Use the `db.printShardingStatus()` command on the `admin` database to get detailed sharding information, including shard versions.
use admin; db.printShardingStatus()
5
If you suspect data corruption or significant inconsistencies within the config server's oplog or data files, consider a more drastic measure like rebuilding the config server replica set. This is a complex operation that will likely involve downtime and requires careful planning and execution. Consult MongoDB documentation or seek expert assistance for this procedure.