BitWiseIntelligent Backends
Published on

Building Production MongoDB Replica Sets in Docker Swarm

Authors
Server rack with cables and LED indicators

Running a stateful database with high availability guarantees inside an ephemeral container orchestrator presents a fundamental tension. Docker Swarm is designed around the assumption that containers are disposable: they can be recreated, rescheduled, and assigned new IP addresses at any time. MongoDB replica sets, by contrast, depend on stable network identities and coordinated configuration across members. Bridging these two paradigms in a way that is both automated and production-safe is a non-trivial problem.

This post introduces the MongoDB Replica Set Manager, a sidecar controller I built to automate the full lifecycle of MongoDB replica sets running in Docker Swarm. It covers the architecture, the key operational capabilities, and the lessons learned from running it in production.

The Problem

Traditional MongoDB replica set configuration assumes a relatively static environment: fixed IP addresses or hostnames, manual initialization of the replica set, and predictable node discovery. Docker Swarm breaks all three of these assumptions. IP addresses are assigned dynamically and may change on container restart. Service discovery operates through DNS rather than static configuration. Containers are subject to recreation and rescheduling at the orchestrator's discretion. The result is that a manual approach to replica set management, one that works well on bare metal or virtual machines, becomes untenable in a Swarm environment.

MongoDB Replica Set Manager

The MongoDB Replica Set Manager was built to eliminate this operational burden entirely. Rather than requiring manual intervention for initialization, scaling, or failover, it provides a sidecar controller that automates replica set management by leveraging Docker's own APIs for service discovery and container orchestration. The controller monitors the cluster continuously, detects changes in membership, and reconfigures the replica set accordingly, whether the change is a fresh deployment, a node failure, or a deliberate scaling operation.

Architecture Overview

The solution consists of three Docker services deployed as a Swarm stack, communicating over an encrypted overlay network.

The database service runs MongoDB in global mode (one instance per Swarm node), configured with keyfile authentication and a health check sidecar. The dbcontroller is a Python-based sidecar that runs as a single replica constrained to a manager node. It mounts the Docker socket, giving it the ability to discover running tasks, inspect their network addresses, and execute commands inside MongoDB containers. An optional nosqlclient service provides a web interface for manual inspection.

Sidecar Controller

The controller is the orchestration brain of the system. It monitors cluster health, handles replica set configuration, manages member discovery through Docker's APIs, and coordinates failover operations. By running on a manager node with access to the Docker socket, it can observe the full state of the Swarm and act on it autonomously.

# docker-compose.yml snippet
services:
  mongo:
    image: mongo:7
    command: mongod --replSet rs0 --bind_ip_all
    deploy:
      replicas: 3

  mongo-controller:
    image: bitwise/mongo-rs-controller:latest
    environment:
      - MONGO_REPLICA_SET_NAME=rs0
      - MONGO_SERVICE_NAME=mongo
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    deploy:
      mode: global

Service Discovery

Rather than relying on static configuration, the controller uses Docker Swarm's native task API for dynamic member discovery. It queries the running tasks for the MongoDB service, filters for those in a healthy running state, and extracts their overlay network IP addresses. This approach ensures that the controller always has an accurate view of the current cluster membership, regardless of how containers have been rescheduled.

async function discoverMembers() {
  const serviceName = process.env.MONGO_SERVICE_NAME
  const tasks = await docker.listTasks({
    filters: {
      service: [serviceName],
      'desired-state': ['running'],
    },
  })

  return tasks
    .filter((task) => task.Status.State === 'running')
    .map((task) => {
      const ip = task.NetworksAttachments[0].Addresses[0].split('/')[0]
      return `${ip}:27017`
    })
}

Replica Set Configuration Manager

With an accurate member list in hand, the controller determines the appropriate action. If no existing replica set configuration is found, it initializes a new one. If a configuration exists but the member IPs have changed (as happens after a stack redeployment or node rescheduling), it reconfigures the replica set to reflect the current state. This detection runs continuously, ensuring that the replica set remains correctly configured through any infrastructure change.

async function ensureReplicaSet() {
  const members = await discoverMembers()
  const config = await getReplicaSetConfig()

  if (!config) {
    // Initialize new replica set
    await initializeReplicaSet(members)
  } else {
    // Update existing configuration
    await updateReplicaSetMembers(config, members)
  }
}

Key Features

Automatic Initialization

On a fresh deployment, the controller waits for all MongoDB instances to report as healthy, discovers the full set of cluster members via the Swarm task API, initializes the replica set with an optimal configuration, and establishes authentication and access controls, including the creation of an admin user and the initial application database. The entire process requires no manual intervention.

async function initializeReplicaSet(members) {
  const config = {
    _id: process.env.MONGO_REPLICA_SET_NAME,
    members: members.map((host, idx) => ({
      _id: idx,
      host: host,
      priority: idx === 0 ? 2 : 1, // Prefer first member as primary
    })),
  }

  await mongo.admin().command({
    replSetInitiate: config,
  })

  console.log('✓ Replica set initialized successfully')
}

Dynamic Scaling

Scaling the MongoDB cluster is as simple as updating the Swarm service replica count. The controller's continuous monitoring loop detects new members joining or existing members departing, reconfigures the replica set accordingly, waits for initial sync to complete on newly added nodes, and updates priorities and votes to reflect the new topology.

# Scale from 3 to 5 members
docker service scale mongo_mongo=5

Automatic Failover

When a node fails, the system responds through a multi-stage process. Health checks identify the failed member within approximately five seconds. The remaining members elect a new primary, typically completing within ten seconds. When the failed node recovers, it is automatically re-added to the replica set. Throughout this process, clients experience zero downtime as they reconnect automatically via the replica set connection string.

async function monitorHealth() {
  setInterval(async () => {
    const status = await mongo.admin().command({ replSetGetStatus: 1 })

    status.members.forEach((member) => {
      if (member.health === 0 && member.stateStr !== 'REMOVED') {
        console.warn(`⚠ Member ${member.name} is unhealthy`)
        // Controller will automatically handle removal/re-add
      }
    })
  }, 5000)
}

Production Deployment

Deploying the replica set in a production environment requires attention to three areas of configuration: network isolation, persistent storage, and resource constraints.

Network Configuration

An encrypted overlay network ensures that all inter-node communication remains secure and isolated from other Swarm services.

networks:
  mongo-cluster:
    driver: overlay
    attachable: true
    driver_opts:
      encrypted: 'true'

Storage Strategy

Volume mounts with bind devices provide the durability guarantees that a production database demands, ensuring data survives container recreation and node rescheduling.

volumes:
  mongo-data:
    driver: local
    driver_opts:
      type: none
      o: bind
      device: /mnt/mongodb/data

Resource Limits

Appropriate resource constraints prevent the database from consuming more than its share of host resources, while reservations guarantee a minimum allocation that ensures consistent performance under load.

deploy:
  resources:
    limits:
      cpus: '2'
      memory: 4G
    reservations:
      cpus: '1'
      memory: 2G

Monitoring & Observability

The controller exposes a Prometheus-compatible metrics endpoint that surfaces the essential health indicators of the replica set: total member count, healthy member count, and primary availability.

// Metrics endpoint
app.get('/metrics', async (req, res) => {
  const status = await getReplicaSetStatus()

  res.send(`
# HELP mongodb_rs_members_total Total replica set members
mongodb_rs_members_total ${status.members.length}

# HELP mongodb_rs_healthy_members Healthy replica set members
mongodb_rs_healthy_members ${status.members.filter((m) => m.health === 1).length}

# HELP mongodb_rs_primary Current primary node
mongodb_rs_primary ${status.members.find((m) => m.stateStr === 'PRIMARY') ? 1 : 0}
  `)
})

These metrics integrate naturally with Grafana for visualization, providing dashboards that cover replica set member status, replication lag between primary and secondaries, connection pool utilization, and operation latencies across the cluster.

Real-World Performance

In production across multiple environments, the system has demonstrated failover times of under 15 seconds from detection to new primary election, zero data loss when using appropriate write concerns (w: "majority"), automatic recovery of failed members when they return to a healthy state, and seamless scaling of members with no manual intervention required.

Lessons Learned

1. DNS Resolution Timing

Docker Swarm's DNS can take several seconds to update after a topology change, and code that queries DNS immediately after a scaling event will often see stale results. Implementing retry logic with reasonable timeouts is essential to avoid false negatives during member discovery.

async function waitForDNS(serviceName, expectedCount, timeout = 60000) {
  const startTime = Date.now()

  while (Date.now() - startTime < timeout) {
    const members = await discoverMembers(serviceName)
    if (members.length >= expectedCount) {
      return members
    }
    await sleep(2000)
  }

  throw new Error('DNS resolution timeout')
}

2. Write Concern is Critical

For production writes, using w: "majority" is non-negotiable. Without it, a write acknowledged by only the primary could be lost if that primary fails before replication completes. The write concern ensures that a majority of replica set members have confirmed the write before it is considered successful.

await collection.insertOne(
  { data: 'important' },
  { writeConcern: { w: 'majority', wtimeout: 5000 } }
)

3. Connection String Format

Using the replica set connection string format rather than a single-host address is what enables the MongoDB driver to handle failover transparently. It allows the driver to automatically discover all members, route writes to the current primary, distribute reads across secondaries when configured to do so, and reconnect seamlessly when a failover occurs.

mongodb://mongo1,mongo2,mongo3/?replicaSet=rs0&readPreference=primaryPreferred

Getting Started

The repository provides everything needed to deploy a production-ready MongoDB cluster on an existing Swarm. Clone it, review and customize the stack file to match your environment, and deploy.

# Clone the repository
git clone https://github.com/BitWise-0x/MongoDB-ReplicaSet-Manager

# Review and customize the stack file
vim docker-stack.yml

# Deploy to your Swarm cluster
docker stack deploy -c docker-stack.yml mongodb

# Monitor initialization
docker service logs -f mongodb_controller

# Verify replica set status
docker exec $(docker ps -q -f name=mongodb_mongo) \
  mongo --eval "rs.status()"

Future Enhancements

Several improvements are in progress: automated backups with scheduled snapshots and retention policies, point-in-time recovery through oplog replay for disaster recovery scenarios, multi-region support for geographically distributed replica sets, automated TLS/SSL certificate management, and performance auto-tuning based on observed workload patterns.

Conclusion

The tension between stateful database requirements and ephemeral container orchestration is real, but it is not insurmountable. With the right automation layer in place, running MongoDB replica sets in Docker Swarm becomes not only feasible but operationally straightforward. Initialization, scaling, failover, and recovery all happen without manual intervention.

The MongoDB Replica Set Manager has been running in production for over a year across multiple clusters, and the operational overhead has been minimal. The sidecar pattern has proven well-suited to this problem: by placing the orchestration logic alongside the database containers rather than in an external system, the controller can leverage Docker's own APIs for discovery and configuration with minimal latency and maximum accuracy. Check out the repository on GitHub for full documentation and deployment examples.

···