Leaders and Followers
Imagine a database that stores the inventory of a store. The leader replica of the database is located at the store’s headquarters. The follower replicas are located at the store’s branches.
When a customer buys an item, the sale is recorded in the database at the headquarters. The leader replica then sends the change to all the follower replicas. This ensures that all the replicas of the database have the same inventory information.
When a cashier at a branch needs to check the inventory of an item, they can query the local follower replica. This is faster than querying the leader replica, because the follower replica is closer to the cashier.
Leader-based replication, also known as active/passive or master–slave replication, is a common data replication strategy that uses a single leader node to coordinate changes to the data. The other nodes in the system are followers, which replicate the data from the leader.
Synchronous VS Asynchronous Replication
Synchronous replication and asynchronous replication are two different ways to replicate data between nodes in a distributed system.
In synchronous replication, the leader node waits for all of the follower nodes to acknowledge receipt of the data change before reporting success to the client. This means that the follower nodes are guaranteed to have an up-to-date copy of the data that is consistent with the leader. However, synchronous replication can impact the performance of the system, since the leader node cannot process another write until all of the follower nodes have acknowledged the previous write.
Example: A bank uses synchronous replication to store customer account data. This ensures that all copies of the data are always consistent, even if a node fails.
In asynchronous replication, the leader node does not wait for the follower nodes to acknowledge receipt of the data change before reporting success to the client. This means that the follower nodes may not have the most up-to-date copy of the data, and there is a risk that some data may be lost if the leader node fails before the data change has been replicated to all of the follower nodes. However, asynchronous replication does not impact the performance of the system, since the leader node can process the next write immediately.
Example: A social media website uses asynchronous replication to store user data. This allows the website to scale to a large number of users without sacrificing performance. However, it also means that there may be a delay before all copies of the data are updated.
Which type of replication to use?
The best type of replication to use depends on the specific requirements of the system.
If data consistency is critical, then synchronous replication is the best option. However, if performance is more important, then asynchronous replication is the better option.
In practice, many systems use a combination of synchronous and asynchronous replication. For example, a database might use synchronous replication to replicate data to a small number of critical replicas, and asynchronous replication to replicate data to other replicas. This configuration is sometimes also called Semi-Synchronous.
The replication to follower 1 is synchronous: the leader waits until follower 1 has confirmed that it received the write before reporting success to the user, and before making the write visible to other clients. The replication to follower 2 is asynchronous: the leader sends the message, but doesn’t wait for a response from the follower.
How to Set up New Followers
When setting up new followers in a leader-based replication system, you need to ensure that the new follower has an accurate copy of the leader’s data. This can be done by taking a consistent snapshot of the leader’s database at some point in time, copying the snapshot to the new follower node, and then having the follower connect to the leader and request all the data changes that have happened since the snapshot was taken.
Example:
Suppose you have a database with a leader node and two follower nodes. You want to add a third follower node.
To do this, you would:
- Take a snapshot of the leader’s database.
- Copy the snapshot to the new follower node.
- Start the new follower node.
- The new follower node will automatically connect to the leader node and request all the data changes that have happened since the snapshot was taken.
- Once the new follower node has caught up to the leader, it will start replicating data from the leader in real time.
Benefits of using a consistent snapshot:
- Avoiding downtime: Using a consistent snapshot to set up a new follower allows you to do so without taking the database down for maintenance.
- Ensuring data consistency: Using a consistent snapshot ensures that the new follower will have an accurate copy of the leader’s data.
Drawbacks of using a consistent snapshot:
- Increased storage requirements: Creating a consistent snapshot of the database will require additional storage space.
- Increased time to set up the new follower: Creating a consistent snapshot of the database can take some time, depending on the size of the database.
Handling Node Outages
Node outages are inevitable in any distributed system, and leader-based replication is no exception. However, there are a number of strategies that can be used to minimize the impact of node outages on the overall system availability.
There are two main types of node outages to consider: follower failures and leader failures.
Follower Failure
Followers are relatively easy to handle in the event of a failure. When a follower fails, it can simply reconnect to the leader and resume replicating data. This process is known as catch-up recovery.
Leader Failure
Leader failure is more challenging to handle, as the leader is responsible for processing all write requests. When the leader fails, one of the followers must be promoted to the role of leader. This process is known as failover.
Failover can be either manual or automatic. Manual failover involves an administrator taking the necessary steps to make a new leader after being notified of the leader’s failure. Automatic failover involves the system automatically detecting that the leader has failed and then taking the necessary steps to make a new leader.
Automatic failover typically involves the following steps:
- Detecting that the leader has failed. This is done using a timeout mechanism. If a node does not respond for a certain period of time, it is assumed to be dead.
- Choosing a new leader. This is typically done through an election process. The follower with the most up-to-date data changes from the old leader is usually the best candidate for leadership.
- Reconfiguring the system to use the new leader. This involves updating clients and other nodes in the system so that they know the new leader’s address.
Challenges of failover
There are a number of challenges associated with failover:
- Data loss: If asynchronous replication is used, the new leader may not have received all the writes from the old leader before it failed. This can lead to data loss. Thus, there is replica called passive master that replicates changes synchronously with master. And in case of failover, this passive master can be elected to master.
- Split brain: In certain fault scenarios, it is possible for two nodes to both believe that they are the leader. This is called split brain. Split brain can lead to data corruption.
- Unnecessary failovers: A too-short timeout can trigger unnecessary failovers, which can make the situation worse if the system is already struggling with high load or network problems.
Overall, leader-based replication is a powerful tool for ensuring the availability and reliability of distributed systems. However, it is important to understand the challenges involved in handling node outages.
Implementation of Replication Logs
Replication logs are a key part of leader-based replication, a method of keeping multiple copies of a database in sync. The leader node is responsible for writing all changes to the database, and the follower nodes replicate those changes from the leader.
There are four main types of replication logs used in leader-based replication:
Statement-based replication
Statement-based replication is the simplest type of replication, but it is also the most limited. In statement-based replication, the leader logs every SQL statement that it executes and sends it to the followers. The followers then execute the statements in the same order as the leader.
Statement-based replication can be problematic because it can be difficult to ensure that the followers execute the statements in the same order as the leader, and it can also lead to data inconsistencies if the statements have side effects.
WAL shipping
WAL shipping is a more reliable type of replication, but it is also more tightly coupled to the database’s storage engine. In WAL shipping, the leader logs every change to the database to the write-ahead log (WAL). The WAL is a sequence of records that describe the changes to the database. The leader then sends the WAL to the followers, and the followers apply the changes to their databases.
WAL shipping is a good choice for databases that need high availability and reliability. However, it can be difficult to upgrade the database software when using WAL shipping, because the leader and the followers must be running the same version of the software.
Logical (row-based) log replication
Logical (row-based) log replication is a more flexible type of replication than WAL shipping. In logical (row-based) log replication, the leader logs every change to the database at the granularity of a row. The log record contains enough information to identify the row that was changed and the new values of all columns in the row. The leader then sends the log record to the followers, and the followers apply the change to their databases.
Logical (row-based) log replication is a good choice for databases that need to be flexible and scalable. It is also a good choice for databases that need to be upgraded frequently, because the leader and the followers can be running different versions of the software.
Trigger-based replication
Trigger-based replication is a more complex type of replication, but it is also the most flexible. In trigger-based replication, the leader uses database triggers to log data changes into a separate table. An external process then reads the data changes from the table and replicates them to another system.
Trigger-based replication is a good choice for databases that need to be customized or integrated with other systems. However, it can be more difficult to implement and maintain than other types of replication.
The best type of replication log for you will depend on your specific needs. If you need a simple and reliable replication solution, then WAL shipping is a good choice. If you need a more flexible solution, then logical (row-based) log replication is a good choice. If you need to customize the replication process or integrate with other systems, then trigger-based replication may be the best option for you.
Overall leader-follower replication is all about clients sending all writes to a single node (the leader), which sends a stream of data change events to the other replicas (followers). Reads can be performed on any replica, but reads from followers might be stale. And this article was all about how we can make leader-based replication more reliable and consistent.