What is Replication
Replication is a mechanism that copies and synchronizes database data across multiple servers. It’s used for improving availability, enhancing read performance, and disaster recovery.
Why it’s needed: A single database server becomes a Single Point of Failure (SPOF). Replication allows service continuity during failures and distributes read queries.
Basic Replication Configurations
Master-Slave (Primary-Replica)
flowchart TB
Writes["Writes"] --> Master["Master<br/>(Primary)"]
Master -->|Replication| Slave1["Slave 1<br/>(Replica)"]
Master -->|Replication| Slave2["Slave 2<br/>(Replica)"]
Master -->|Replication| Slave3["Slave 3<br/>(Replica)"]
Slave1 --> Reads["Reads"]
Slave2 --> Reads
Slave3 --> Reads
- Master (Primary): Main server accepting writes
- Slave (Replica): Copies master data, handles reads
Multi-Master
flowchart TB
Master1["Master 1"] <-->|Sync| Master2["Master 2"]
Master1 --> RW1["Read/Write"]
Master2 --> RW2["Read/Write"]
Multiple masters accept writes. Conflict resolution becomes a challenge.
Synchronous vs Asynchronous Replication
Synchronous Replication
flowchart LR
Client --> Master --> Slave["Slave<br/>(Wait for ACK)"] --> Response["Response to Client"]
| Advantages | Disadvantages |
|---|---|
| No data loss | Increased latency |
| Strong consistency | Writes stop on slave failure |
Asynchronous Replication
flowchart LR
Client --> Master --> Response["Response to Client"]
Master -.->|Applied later| Slave
| Advantages | Disadvantages |
|---|---|
| Low latency | Replication lag |
| No impact from slave failure | Potential data loss |
Semi-synchronous Replication
Commit is confirmed once at least one slave receives the data.
flowchart LR
Client --> Master --> Slave1["Slave 1<br/>(ACK)"] --> Response["Response to Client"]
Master -.->|Applied later| Slave2["Slave 2"]
Replication Methods
Statement-based
Replicates SQL statements themselves.
-- Executed on master
INSERT INTO users (name, created_at) VALUES ('Alice', NOW());
-- Same SQL executed on slave
-- Problem: NOW() might return different results
Row-based
Replicates the actual row data that changed.
| Before | After |
|---|---|
{id: 1, name: 'Alice'} | {id: 1, name: 'Bob'} |
This diff is applied to slave.
Mixed Mode
Uses statement-based normally, row-based for non-deterministic functions.
Replication Lag
The delay until a slave catches up with the master.
| Server | Transactions | Status |
|---|---|---|
| Master | 1, 2, 3, 4, 5 | ✓ Complete |
| Slave | 1, 2, 3 | ✓ Lag: 2 transactions |
Cases Where Lag Causes Problems
// Read immediately after write
await db.master.query('UPDATE users SET name = ? WHERE id = ?', ['Bob', 1]);
// Read from slave → May return old data
const user = await db.slave.query('SELECT * FROM users WHERE id = ?', [1]);
// user.name might still be 'Alice'!
Solutions
- Read Your Writes: Read from master immediately after writing
- Causal Consistency: Track write timestamps
- Synchronous Replication: For critical data only
Failover
When the master fails, a slave is promoted to become the new master.
Automatic Failover
flowchart TB
Step1["1. Detect master failure<br/>Monitoring system detects no response from master"]
Step2["2. Select slave<br/>Choose the slave with most recent replication progress"]
Step3["3. Promotion<br/>Promote selected slave to master"]
Step4["4. Switch connections<br/>Update application connections to new master"]
Step1 --> Step2 --> Step3 --> Step4
Failover Considerations
| Challenge | Solution |
|---|---|
| Split brain | Fencing (force stop old master) |
| Data loss | Unapplied transactions in async replication |
| Connection switching | Virtual IP, DNS update, proxy |
Common Implementations
MySQL
-- Master configuration
CHANGE MASTER TO
MASTER_HOST='master.example.com',
MASTER_USER='repl_user',
MASTER_PASSWORD='password',
MASTER_LOG_FILE='mysql-bin.000001',
MASTER_LOG_POS=4;
START SLAVE;
PostgreSQL
# postgresql.conf (master)
wal_level = replica
max_wal_senders = 3
# recovery.conf (slave)
standby_mode = 'on'
primary_conninfo = 'host=master.example.com port=5432'
Managed Services
| Service | Replica Features |
|---|---|
| Amazon RDS | Read Replicas, Multi-AZ |
| Cloud SQL | Read Replicas, High Availability |
| Azure SQL | geo-replication |
Read Scaling Pattern
flowchart LR
App --> Master["Master<br/>(Writes)"]
App --> LB["Load Balancer"]
LB --> R1["Replica 1"]
LB --> R2["Replica 2"]
LB --> R3["Replica 3"]
R1 --> Reads["Reads"]
R2 --> Reads
R3 --> Reads
Load balancer distributes read queries across replicas.
Summary
Database replication is a foundational technology for achieving high availability and read scalability. By properly designing synchronous/asynchronous choices, addressing replication lag, and planning failover strategies, you can build robust database systems.
← Back to list