Source: Aptos Labs
Since the advent of computing technology, engineers and researchers have continuously explored how to push computing resources to their performance limits, striving to maximize efficiency while minimizing the latency of computational tasks. High performance and low latency have always been the two pillars shaping the development of computer science, influencing a wide range of fields, including CPU, FPGA, database systems, as well as recent advancements in artificial intelligence infrastructure and blockchain systems. In the pursuit of high performance, pipeline technology has become an indispensable tool. Since its introduction in 1964 with the IBM System/360 [1], it has been the core of high-performance system design, driving key discussions and innovations in the field.
Pipeline technology is not only applied in hardware but also widely used in databases. For example, Jim Gray introduced the pipeline parallel method in his work High-Performance Database Systems [2]. This method decomposes complex database queries into multiple stages and runs them concurrently, thereby improving efficiency and performance. Pipeline technology is also crucial in artificial intelligence, especially in the widely used deep learning framework TensorFlow. It utilizes data pipeline parallelism to handle data preprocessing and loading, ensuring smooth data flow for training and inference, making AI workflows faster and more efficient [3].
Blockchain is no exception. Its core functionality is similar to databases, processing transactions and updating states, but with the added challenge of Byzantine fault-tolerant consensus. Enhancing blockchain throughput (transactions per second) and reducing latency (time to final confirmation) lies in optimizing the interactions between different stages—sorting, executing, submitting, and synchronizing transactions—under high loads. This challenge is particularly critical in high-throughput scenarios, as traditional designs struggle to maintain low latency.
To explore these ideas, let’s revisit a familiar analogy: the automotive factory. Understanding how the assembly line revolutionized manufacturing helps us appreciate the evolution of blockchain pipelines—and why next-generation designs like Zaptos[8] are pushing blockchain performance to new heights.
Imagine you are the owner of an automotive factory with two main goals:
· Maximize throughput: Assemble as many cars as possible each day.
· Minimize latency: Shorten the time it takes to build each car.
Now, imagine three types of factories:
In a simple factory, a group of versatile workers assembles a car step by step. One worker assembles the engine, the next installs the wheels, and so on—producing one car at a time.
What’s the issue? Some workers are often idle, and overall production efficiency is low because no one is working on different parts of the same car simultaneously.
Enter the Ford assembly line [4]! Here, each worker focuses on a single task. The car moves along a conveyor belt, and as it passes, each specialized worker adds their own parts.
What happens? Multiple cars are in different stages of assembly simultaneously, and all workers are busy. Throughput increases significantly, but each car still has to pass through each worker one by one, meaning the delay time per car remains unchanged.
Now, imagine a magic factory where all workers can work on the same car at the same time! There’s no need to move the car from one station to the next; every part of the car is being built simultaneously.
What’s the outcome? Cars are assembled at record speed, with every step happening in sync. This is the ideal scenario for solving both throughput and latency issues.
Now, with the car factory discussion out of the way—what about blockchain? It turns out that designing a high-performance blockchain is not so different from optimizing an assembly line.
In blockchain, processing a block is similar to assembling a car. The analogy is as follows:
· Workers = Validator resources
· Cars = A block
· Assembly tasks = Stages like consensus, execution, and submission
Just as a simple factory processes one car at a time, if a blockchain processes one block at a time, it leads to underutilization of resources. In contrast, modern blockchain designs aim to function like the Ford assembly line—handling different stages of multiple blocks simultaneously. This is where pipeline technology comes into play.
Imagine a blockchain that processes blocks sequentially. Validators need:
Receive the block proposal.
Execute the block to update the blockchain state.
Continue consensus on that state.
Persist the state to the database.
Begin consensus for the next block.
What’s the problem?
· Execution and submission are on the critical path of the consensus process.
· Each consensus instance must wait for the previous one to finish before starting.
This setup is like a pre-Ford era factory: workers (resources) are often idle when focusing on one block (car) at a time. Unfortunately, many existing blockchains still belong to this category, resulting in low throughput and high latency.
Diem introduced a pipeline architecture that decouples execution and submission from the consensus phase, while also adopting a pipeline design for consensus itself.
· Asynchronous Execution and Submission [5]: Validators first reach consensus on a block, then execute the block based on the parent block’s state. Once signed by the required number of validators, the state is persisted to storage.
· Pipeline Consensus (Jolteon[6]): New consensus instances can begin before the previous one is complete, much like a moving assembly line.
This increases throughput by allowing different blocks to be at different stages simultaneously, significantly reducing block time to just two message delays. However, Jolteon’s leader-based design could cause bottlenecks, as the leader becomes overloaded during transaction distribution.
Aptos further optimized the pipeline with Quorum Store[7], a mechanism that decouples data distribution from consensus. Quorum Store no longer relies on a single leader to broadcast large data blocks in the consensus protocol but separates data distribution from metadata ordering, allowing validators to asynchronously and concurrently distribute data. This design utilizes the total bandwidth of all validators, effectively eliminating the leader bottleneck in consensus.
Illustration: How Quorum Store balances resource utilization based on the leader-based consensus protocol.
With this, the Aptos blockchain has created the “Ford factory” of blockchain. Just as Ford’s assembly line revolutionized car production—different stages of different cars happening simultaneously—Aptos processes different stages of different blocks concurrently. The resources of each validator are fully utilized, ensuring no part of the process is left waiting. This clever orchestration results in a high-throughput system, making Aptos a powerful platform for efficiently and scalably processing blockchain transactions.
Illustration: Pipeline processing of consecutive blocks in the Aptos blockchain. Validators can pipeline different stages of consecutive blocks to maximize resource utilization and increase throughput.
While throughput is crucial, end-to-end latency—the time from transaction submission to final confirmation—is equally important. For applications like payments, decentralized finance (DeFi), and gaming, every millisecond counts. Many users have experienced delays during high-traffic events because each transaction must pass through a series of stages sequentially: client - full node - validator communication, consensus, execution, state validation, submission, and full node synchronization. Under high load, stages like execution and full node synchronization add more delay.
Illustration: Pipeline architecture of the Aptos blockchain. The diagram shows client Ci, full node Fi, and validator Vi. Each box represents a stage a transaction block in the blockchain goes through from left to right. The pipeline consists of five stages: consensus (including distribution and ordering), execution, validation, submission, and full node synchronization.
It’s like the Ford factory: although the assembly line maximizes overall throughput, each car still has to go through each worker sequentially, so completion time is longer. To truly push blockchain performance to its limits, we need to build a “magic factory”—where these stages run in parallel.
Zaptos[8] reduces latency through three key optimizations without sacrificing throughput.
· Optimistic Execution: Reduces pipeline latency by starting execution immediately after receiving a block proposal. Validators immediately add the block to the pipeline and speculate on execution after the parent block completes. Full nodes also perform optimistic execution upon receiving the proposal from the validator to verify the state proof.
· Optimistic Submission: Writes the state to storage immediately after block execution—even before state validation. When the validator eventually certifies the state, only minimal updates are needed to complete the submission. If a block is ultimately not ordered, the optimistically submitted state is rolled back to maintain consistency.
· Fast Validation: Validators begin state validation of executed blocks in parallel during the final consensus round, without waiting for consensus to complete. This optimization typically reduces pipeline latency by one round in common scenarios.
Illustration: The parallel pipeline architecture of Zaptos. All stages except consensus are effectively hidden within the consensus stage, reducing end-to-end latency.
Through these optimizations, Zaptos effectively hides the latency of other pipeline stages within the consensus stage. As a result, if the blockchain adopts a consensus protocol with optimal latency, the overall blockchain latency can also reach its optimum!
We evaluated Zaptos’ end-to-end performance through geographically distributed experiments, using Aptos as the high-performance baseline. For more details, refer to the paper [8].
On Google Cloud, we simulated a globally decentralized network consisting of 100 validators and 30 full nodes, distributed across 10 regions, using commercial-grade machines similar to those used in the Aptos deployment.
Illustration: Performance Comparison of Zaptos and Aptos Blockchains.
The chart above compares the relationship between end-to-end latency and throughput for both systems. Both experience a gradual increase in latency as the load increases, with sharp spikes at maximum capacity. However, Zaptos consistently shows more stable latency before reaching peak throughput, reducing latency by 160 milliseconds under low load and over 500 milliseconds under high load.
Impressively, Zaptos achieves sub-second latency at 20k TPS in a production-grade mainnet environment—this breakthrough makes real-world applications requiring speed and scalability a reality.
Illustration: Latency Breakdown of the Aptos Blockchain.
Illustration: Latency Breakdown of Zaptos.
The latency breakdown diagram provides a detailed view of the duration of each pipeline stage for validators and full nodes. Key insights include:
· Up to 10k TPS: The overall latency of Zaptos is almost identical to its consensus latency, as the optimistic execution, validation, and optimistic submission stages are effectively “hidden” within the consensus stage.
· Above 10k TPS: As the time for optimistic execution and full node synchronization increases, non-consensus stages become more significant. Nonetheless, Zaptos significantly reduces overall latency by overlapping most stages. For example, at 20k TPS, the baseline total latency is 1.32 seconds (consensus 0.68 seconds, other stages 0.64 seconds), while Zaptos achieves 0.78 seconds (consensus 0.67 seconds, other stages 0.11 seconds).
The evolution of blockchain architecture is akin to the transformation in manufacturing—from simple sequential workflows to highly parallelized assembly lines. Aptos’ pipeline approach significantly boosts throughput, while Zaptos takes it further by reducing latency to sub-second levels, maintaining high TPS. Just as modern computing architectures leverage parallelism to maximize efficiency, blockchains must continuously optimize their designs to eliminate unnecessary latency. By fully optimizing the blockchain pipeline for the lowest latency, Zaptos paves the way for real-world blockchain applications that require both speed and scalability.
This article is reprinted from [BlockBeats], and the copyright belongs to the original author [Aptos Labs]. If you have any objections to the reprint, please contact the Gate Learn team, and the team will handle it as soon as possible according to relevant procedures.
Disclaimer: The views and opinions expressed in this article represent only the author’s personal views and do not constitute any investment advice.
Other language versions of the article are translated by the Gate Learn team. The translated article may not be copied, distributed or plagiarized without mentioning Gate.io.
Source: Aptos Labs
Since the advent of computing technology, engineers and researchers have continuously explored how to push computing resources to their performance limits, striving to maximize efficiency while minimizing the latency of computational tasks. High performance and low latency have always been the two pillars shaping the development of computer science, influencing a wide range of fields, including CPU, FPGA, database systems, as well as recent advancements in artificial intelligence infrastructure and blockchain systems. In the pursuit of high performance, pipeline technology has become an indispensable tool. Since its introduction in 1964 with the IBM System/360 [1], it has been the core of high-performance system design, driving key discussions and innovations in the field.
Pipeline technology is not only applied in hardware but also widely used in databases. For example, Jim Gray introduced the pipeline parallel method in his work High-Performance Database Systems [2]. This method decomposes complex database queries into multiple stages and runs them concurrently, thereby improving efficiency and performance. Pipeline technology is also crucial in artificial intelligence, especially in the widely used deep learning framework TensorFlow. It utilizes data pipeline parallelism to handle data preprocessing and loading, ensuring smooth data flow for training and inference, making AI workflows faster and more efficient [3].
Blockchain is no exception. Its core functionality is similar to databases, processing transactions and updating states, but with the added challenge of Byzantine fault-tolerant consensus. Enhancing blockchain throughput (transactions per second) and reducing latency (time to final confirmation) lies in optimizing the interactions between different stages—sorting, executing, submitting, and synchronizing transactions—under high loads. This challenge is particularly critical in high-throughput scenarios, as traditional designs struggle to maintain low latency.
To explore these ideas, let’s revisit a familiar analogy: the automotive factory. Understanding how the assembly line revolutionized manufacturing helps us appreciate the evolution of blockchain pipelines—and why next-generation designs like Zaptos[8] are pushing blockchain performance to new heights.
Imagine you are the owner of an automotive factory with two main goals:
· Maximize throughput: Assemble as many cars as possible each day.
· Minimize latency: Shorten the time it takes to build each car.
Now, imagine three types of factories:
In a simple factory, a group of versatile workers assembles a car step by step. One worker assembles the engine, the next installs the wheels, and so on—producing one car at a time.
What’s the issue? Some workers are often idle, and overall production efficiency is low because no one is working on different parts of the same car simultaneously.
Enter the Ford assembly line [4]! Here, each worker focuses on a single task. The car moves along a conveyor belt, and as it passes, each specialized worker adds their own parts.
What happens? Multiple cars are in different stages of assembly simultaneously, and all workers are busy. Throughput increases significantly, but each car still has to pass through each worker one by one, meaning the delay time per car remains unchanged.
Now, imagine a magic factory where all workers can work on the same car at the same time! There’s no need to move the car from one station to the next; every part of the car is being built simultaneously.
What’s the outcome? Cars are assembled at record speed, with every step happening in sync. This is the ideal scenario for solving both throughput and latency issues.
Now, with the car factory discussion out of the way—what about blockchain? It turns out that designing a high-performance blockchain is not so different from optimizing an assembly line.
In blockchain, processing a block is similar to assembling a car. The analogy is as follows:
· Workers = Validator resources
· Cars = A block
· Assembly tasks = Stages like consensus, execution, and submission
Just as a simple factory processes one car at a time, if a blockchain processes one block at a time, it leads to underutilization of resources. In contrast, modern blockchain designs aim to function like the Ford assembly line—handling different stages of multiple blocks simultaneously. This is where pipeline technology comes into play.
Imagine a blockchain that processes blocks sequentially. Validators need:
Receive the block proposal.
Execute the block to update the blockchain state.
Continue consensus on that state.
Persist the state to the database.
Begin consensus for the next block.
What’s the problem?
· Execution and submission are on the critical path of the consensus process.
· Each consensus instance must wait for the previous one to finish before starting.
This setup is like a pre-Ford era factory: workers (resources) are often idle when focusing on one block (car) at a time. Unfortunately, many existing blockchains still belong to this category, resulting in low throughput and high latency.
Diem introduced a pipeline architecture that decouples execution and submission from the consensus phase, while also adopting a pipeline design for consensus itself.
· Asynchronous Execution and Submission [5]: Validators first reach consensus on a block, then execute the block based on the parent block’s state. Once signed by the required number of validators, the state is persisted to storage.
· Pipeline Consensus (Jolteon[6]): New consensus instances can begin before the previous one is complete, much like a moving assembly line.
This increases throughput by allowing different blocks to be at different stages simultaneously, significantly reducing block time to just two message delays. However, Jolteon’s leader-based design could cause bottlenecks, as the leader becomes overloaded during transaction distribution.
Aptos further optimized the pipeline with Quorum Store[7], a mechanism that decouples data distribution from consensus. Quorum Store no longer relies on a single leader to broadcast large data blocks in the consensus protocol but separates data distribution from metadata ordering, allowing validators to asynchronously and concurrently distribute data. This design utilizes the total bandwidth of all validators, effectively eliminating the leader bottleneck in consensus.
Illustration: How Quorum Store balances resource utilization based on the leader-based consensus protocol.
With this, the Aptos blockchain has created the “Ford factory” of blockchain. Just as Ford’s assembly line revolutionized car production—different stages of different cars happening simultaneously—Aptos processes different stages of different blocks concurrently. The resources of each validator are fully utilized, ensuring no part of the process is left waiting. This clever orchestration results in a high-throughput system, making Aptos a powerful platform for efficiently and scalably processing blockchain transactions.
Illustration: Pipeline processing of consecutive blocks in the Aptos blockchain. Validators can pipeline different stages of consecutive blocks to maximize resource utilization and increase throughput.
While throughput is crucial, end-to-end latency—the time from transaction submission to final confirmation—is equally important. For applications like payments, decentralized finance (DeFi), and gaming, every millisecond counts. Many users have experienced delays during high-traffic events because each transaction must pass through a series of stages sequentially: client - full node - validator communication, consensus, execution, state validation, submission, and full node synchronization. Under high load, stages like execution and full node synchronization add more delay.
Illustration: Pipeline architecture of the Aptos blockchain. The diagram shows client Ci, full node Fi, and validator Vi. Each box represents a stage a transaction block in the blockchain goes through from left to right. The pipeline consists of five stages: consensus (including distribution and ordering), execution, validation, submission, and full node synchronization.
It’s like the Ford factory: although the assembly line maximizes overall throughput, each car still has to go through each worker sequentially, so completion time is longer. To truly push blockchain performance to its limits, we need to build a “magic factory”—where these stages run in parallel.
Zaptos[8] reduces latency through three key optimizations without sacrificing throughput.
· Optimistic Execution: Reduces pipeline latency by starting execution immediately after receiving a block proposal. Validators immediately add the block to the pipeline and speculate on execution after the parent block completes. Full nodes also perform optimistic execution upon receiving the proposal from the validator to verify the state proof.
· Optimistic Submission: Writes the state to storage immediately after block execution—even before state validation. When the validator eventually certifies the state, only minimal updates are needed to complete the submission. If a block is ultimately not ordered, the optimistically submitted state is rolled back to maintain consistency.
· Fast Validation: Validators begin state validation of executed blocks in parallel during the final consensus round, without waiting for consensus to complete. This optimization typically reduces pipeline latency by one round in common scenarios.
Illustration: The parallel pipeline architecture of Zaptos. All stages except consensus are effectively hidden within the consensus stage, reducing end-to-end latency.
Through these optimizations, Zaptos effectively hides the latency of other pipeline stages within the consensus stage. As a result, if the blockchain adopts a consensus protocol with optimal latency, the overall blockchain latency can also reach its optimum!
We evaluated Zaptos’ end-to-end performance through geographically distributed experiments, using Aptos as the high-performance baseline. For more details, refer to the paper [8].
On Google Cloud, we simulated a globally decentralized network consisting of 100 validators and 30 full nodes, distributed across 10 regions, using commercial-grade machines similar to those used in the Aptos deployment.
Illustration: Performance Comparison of Zaptos and Aptos Blockchains.
The chart above compares the relationship between end-to-end latency and throughput for both systems. Both experience a gradual increase in latency as the load increases, with sharp spikes at maximum capacity. However, Zaptos consistently shows more stable latency before reaching peak throughput, reducing latency by 160 milliseconds under low load and over 500 milliseconds under high load.
Impressively, Zaptos achieves sub-second latency at 20k TPS in a production-grade mainnet environment—this breakthrough makes real-world applications requiring speed and scalability a reality.
Illustration: Latency Breakdown of the Aptos Blockchain.
Illustration: Latency Breakdown of Zaptos.
The latency breakdown diagram provides a detailed view of the duration of each pipeline stage for validators and full nodes. Key insights include:
· Up to 10k TPS: The overall latency of Zaptos is almost identical to its consensus latency, as the optimistic execution, validation, and optimistic submission stages are effectively “hidden” within the consensus stage.
· Above 10k TPS: As the time for optimistic execution and full node synchronization increases, non-consensus stages become more significant. Nonetheless, Zaptos significantly reduces overall latency by overlapping most stages. For example, at 20k TPS, the baseline total latency is 1.32 seconds (consensus 0.68 seconds, other stages 0.64 seconds), while Zaptos achieves 0.78 seconds (consensus 0.67 seconds, other stages 0.11 seconds).
The evolution of blockchain architecture is akin to the transformation in manufacturing—from simple sequential workflows to highly parallelized assembly lines. Aptos’ pipeline approach significantly boosts throughput, while Zaptos takes it further by reducing latency to sub-second levels, maintaining high TPS. Just as modern computing architectures leverage parallelism to maximize efficiency, blockchains must continuously optimize their designs to eliminate unnecessary latency. By fully optimizing the blockchain pipeline for the lowest latency, Zaptos paves the way for real-world blockchain applications that require both speed and scalability.
This article is reprinted from [BlockBeats], and the copyright belongs to the original author [Aptos Labs]. If you have any objections to the reprint, please contact the Gate Learn team, and the team will handle it as soon as possible according to relevant procedures.
Disclaimer: The views and opinions expressed in this article represent only the author’s personal views and do not constitute any investment advice.
Other language versions of the article are translated by the Gate Learn team. The translated article may not be copied, distributed or plagiarized without mentioning Gate.io.