Special Thanks to Monad Team and Monad Community for feedback and review
EVM (Ethereum Virtual Machine) has established itself as the most powerful narrative in the blockchain market. Whether or not one supports EVM has been a determinant of a blockchain's success, and it's no exaggeration to say we still live in the era of the EVM.
Monad aims to incorporate this EVM narrative with the latest technologies in BFT consensus algorithms. While absorbing the vast community of the Ethereum ecosystem, Monad also seeks to achieve immense scalability as a monolithic blockchain.
Monad also aims to achieve 10,000 TPS (Transactions per second) and single slot finality through a combination of pipelining consensus, execution, and state access, as well as parallelizing individual transactions during execution.
Among the recently introduced Layer 1 monolithic blockchains, Monad is one of the first to adopt EVM. Hence, there's considerable anticipation about how this blockchain will grow after its launch.
The Ethereum Virtual Machine (EVM) and Ethereum's smart contract language, Solidity, have now been around for nearly a decade. Of course, they didn't have a strong influence immediately upon their introduction, but over this extended period of 10 years, with numerous developmental instances, they have built a massive developer ecosystem. Now, it's not an exaggeration to say that they've become the most compelling narrative in the blockchain market. In fact, during the past Layer 1 blockchain wars, one of the most crucial criteria was, "Which chain is EVM-compatible?"
There are numerous blockchains that have embraced the EVM ecosystem and succeeded. Excluding Ethereum, the originator of EVM, there are Layer 1 blockchains like BSC, Polygon, Tron, Avalanche, and Fantom (while Polygon now plans to transition fully to Layer 2, it's fair to consider it as Layer 1 up until now). The most influential current Layer 2 blockchains, such as Base, Arbitrum, and Optimism, have all adopted the EVM ecosystem and achieved explosive growth in a short period.
According to DeFiLlama, chains based on EVM make up the top 8 rankings when measured by TVL (total value locked). It's truly the era of the EVM now (though, as of the writing of this article, various other virtual machines are emerging. If you want to learn more about them, highly recommend reading the article written by Xpara). The chain I'm introducing today, Monad, is believed to further bolster this prevailing trend of the EVM. Notably, Monad presents a methodology distinct from other EVM-compatible chains and is anticipated to bring a fresh breeze to the EVM ecosystem.
Without a doubt, Ethereum is the most successful blockchain, boasting the strongest community and ecosystem. However, there's one thing that Ethereum lacks: "scalability". While Ethereum is making relentless efforts to ensure scalability, the shift from sharding to roll-ups has made it challenging to envision Ethereum (as a base layer, Ethereum might achieve immense scalability with various rollups, but it is almost impossible for Ethereum to become monolithic layer 1 with scalability at this point) with extensive scalability. But let’s entertain the idea: what if the Ethereum we know achieved 10,000 TPS in another parallel universe? How exhilarating would that be? That's precisely what crossed my mind when I first encountered Monad. Although there are significant differences between Monad and Ethereum, both aspire to achieve scalability and embrace the vast EVM community, hence this imagination.
What similarities does Monad share with Ethereum, and how do they differ?
First, to understand EVM Bytecode Compatibility, we must know about Bytecode. Ethereum smart contracts are written in programming languages like Solidity. However, before they're executed on the Ethereum network, they must be compiled into EVM Bytecode, because only Bytecode is comprehensible to the virtual machine. Therefore, if a blockchain claims to be EVM Bytecode Compatible, it implies that the blockchain can execute Bytecodes identically to Ethereum's EVM.
So, since Monad is fully EVM Bytecode Compatible, it can process Ethereum's past transactions in Monad's transaction environment and yield identical results (supporting all opcodes as of the Shanghai fork).
While this may seem similar to the previous point, there's a distinction. Having Ethereum RPC (Remote Procedure Call) compatibility means precisely interpreting Ethereum's RPC requests and having endpoints consistent with Ethereum RPC specs. In simpler terms, it signifies that the various tools built in the Ethereum ecosystem can be freely utilized. To make it clearer, because Monad is Ethereum RPC compatible, it can seamlessly interact with useful Ethereum ecosystem tools like Etherscan and MetaMask. Moreover, it uses ECDSA (Elliptic Curve Digital Signature Algorithm) for addresses and follows Ethereum's transaction format.
However, Monad isn't just about its similarities with Ethereum.
Though more details will follow, from the outset, Monad and Ethereum differ fundamentally in their consensus mechanisms. Ethereum employs a consensus mechanism called Gasper, a combinationof the Casper FFG finality mechanism and the LMD-GHOST fork-choice rule. In contrast, Monad uses its proprietary consensus mechanism, MonadBFT, which can be seen as a variation of HotStuff and Diem BFT. Broadly, both may fall under the PoS umbrella, but they function differently. Monad, in terms of its consensus mechanism, might be more akin to Aptos (which also derives from Diem BFT, using AptosBFT).
Their transaction processing methodologies are also distinct. Ethereum processes transactions sequentially, whereas Monad focuses on parallel processing. Further, unlike Ethereum's Go-based client, Monad's client is crafted in performance-centric languages like C++ and Rust. Lastly, the transaction execution differs: while Ethereum's transactions compute before consensus, Monad’s occur post-consensus. These differences, when delved into, suggest that changes in Monad from Ethereum are necessary modifications for enhancing blockchain performance.
Sei, Sui, and Aptos. All these chains can (or plan to) process transactions in parallel and are recognized for maximizing blockchain speed. Yet, what makes Monad intriguing is its adept fusion of Ethereum's strengths with those of these new chains. While Sei, Sui, and Aptos did not opt for the EVM ecosystem (primarily due to EVM's security issues), choosing instead their proprietary languages and VMs (Sei with WASM and both Sui and Aptos with MOVE), Monad has chosen EVM. Although predicting the future is uncertain, might Monad attract developers faster than its peers?
Therefore, I have named Monad as the Ethereum that has achieved 10,000 TPS. It is not clear if the entire Ethereum community will adopt Monad. However, in a time when EVM compatibility often reminds us of roll-up chains, the mere implementation of a monolithic blockchain methodology based on EVM is noteworthy.
Now, let's delve into the specifics of Monad's technology.
The rapid scalability of Monad primarily originates from its consensus mechanism. Of course, MonadBFT can be seen as a variant developed from traditional BFT mechanisms, similar to how Aptos BFT and Diem BFT evolved. Let's delve into the specifics of MonadBFT.
Source: Diem BFT
To understand the intricacies of Monad BFT, one must comprehend how consensus is achieved within it. First we should start off with some terminology:
A Quorum Certificate (QC) is a threshold signature of 2/3 of the network’s stake weight attesting to the validity of the previous block.
A Timeout Certificate (TC) is a threshold signature of 2/3 of the network’s stake weight attesting to the fact that the previous round timed out (2/3 of the stake weight did not receive a valid block proposal in time).
We’ll refer to the initial round as round K, and subsequent rounds as K+1, K+2, etc. The QC for round K is referred to as QC(K), and for subsequent rounds as QC(K+1), QC(K+2), etc. QC or TC from the previous round of K is referred to as QC(K-1) or TC(K-1).
MonadBFT proceeds as a series of rounds:
Rounds are driven by leaders, who are assigned to rounds on a predetermined schedule that is regularly generated.
Each round generally consists of two parts: the leader sends a proposal to all of the validators, and the validators send vote-related messages back to the next leader.
If everything is running smoothly (the “happy path”), the messaging will be one-to-many-to-one (linear communication). If validators don’t receive a proposal from the leader in time, then they begin sending messages to each other to coordinate skipping the leader and advancing to the next round (quadratic communication).
Proposals carry the new block proposal as well as some aggregated voting information on previous rounds; including both is what makes this mechanism pipelined.
When a validator receives a valid proposal, they send a signed YES vote directly to the next scheduled leader. When 2/3 of the stake weight has voted YES, the new leader is able to build a QC on the previous proposal. Alternatively, in the unlikely event that a validator doesn’t receive a valid proposal in time, they produce a signed timeout message and multicast it to all of their peers. If any validator receives a quorum (2/3 of stake weight) of timeout messages, they aggregate these messages into a TC and forward it to the next leader. This TC also includes a reference to the highest QC that each validator has seen; this information will be used later on.
Thus, the leader of Round K+1 will either receive enough YES votes to produce a QC on Round K, or will receive a preassembled TC for Round K. The leader of Round K+1 will then send a proposal containing a (1) new block of transactions; (2) the latest QC from previous rounds (in the happy path, a newly-produced QC for round K, or in the timeout path, the highest QC referenced in the TC); and (3) (only in the event that the previous round timed out) the TC.
When a validator receives a QC for Round K+1 (i.e. in Proposal K+2), they can consider the proposal from Round K finalized and start executing the transactions locally. Let’s unpack this a bit:
Anyone holding a QC for Round K possesses proof that 2/3 of the stake weight voted YES on block K and all of its ancestors, i.e. that block K has a valid set of transactions.
BUT just holding a QC for Round K is not enough to commit Block K, because you need proof that other people also are aware of the fact that 2/3 of the stake weight voted YES on Block K. It is not safe (yet) to execute the transactions in Block K because you don’t know that everyone else knows the transactions are valid and committable.
Holding a QC on Round K+1 means you have proof that 2/3 of the stake weight has a QC for Round K, so a quorum knows that the transactions in Block K have quorum. Also–importantly–the timeout procedure means that even if no one else has received Block K+2 (so no one else knows about QC(K+1)), when they timeout they will ensure that the next leader hears about at least QC(K), so QC(K) will live on. This makes it safe to finalize Block K and execute its transactions.
As previously mentioned, each consensus in MonadBFT has rounds. Every round consists of two phases. In the first phase, the leader in the consensus sends a message to the voters. In the second phase, the voters send back signed replies to the leader. An intriguing aspect here is the difference from HotStuff. Originally, HotStuff underwent three rounds. However, this approach increased latency during the consensus process, leading MonadBFT to streamline it into two rounds. Switching from three rounds to two is possible because in the event of network disruptions (like if a proposal from the leader isn't received, or if it's not received within a stipulated time), validators communicate directly with all of the other validating nodes (incurring quadratic (n^2) communication overhead) to coordinate skipping the round.
Difference between Linear Communication and Quadratic Communication: In the Quadratic Communication method, once the leader propagates the proposal to each node, the vote data validated from the proposal has to be broadcasted to every other node. This process incurs a network cost that's approximately quadratic (around n²) relative to the number of nodes (the leader's proposal broadcast n times multiplied by each node's vote data propagation to n other nodes). In contrast, the Linear Communication method has the advantage of incurring only a linear (n) network cost. This is because it only propagates to the leader of the next round (leader's proposal broadcast n times multiplied by vote data propagation to the next round's leader once). With Linear Communication, it's possible to enhance the throughput and reduce latency in the blockchain.
When there are no network issues, Monad BFT uses the Linear Communication method.
Source: HotStuff
Pipelining, in simple terms, refers to processing all stages of consensus not in a single round, but over multiple rounds. For instance, let's say there is a block N and the next block is N+1. Even if a QC (Quorum Certificate) is created in block N based on the votes from the majority of voters responding to the message sent by the leader, it is not mandatory to finalize the QC in block N; it can be carried over (piggybacked) to N+1.
Although MonadBFT seems similar to DiemBFT, it introduces unique features such as Shared Mempool, Deferred Execution, Carriage Cost, and Reserve Balance.
"Mempool" is an abbreviation of "Memory" and "Pool", and it is a mechanism where nodes hold information that they have not yet verified. In other words, it can be seen as a kind of waiting room where transactions gather before being included in a block. In blockchain, not all nodes have the same mempool. Therefore, in the case of Monad, if nodes do not have the same data, they use a gossip protocol to request and receive data, sharing the information in the mempool. Currently, Monad shares the mempool using a simple gossip protocol, but in the future, they plan to switch to a communication method called "Broadcast Tree". This structure allows nodes to communicate without directly interacting with each other, transmitting messages through a tree structure, enabling them to propagate messages without redundant broadcasts, thus making the process of sharing mempool data more efficient.
It relates to Monad's scalability. Monad has the challenge of processing 10,000 transactions per second. For instance, if a block contains 10,000 transactions, each being 500 bytes, then the size of the block would be 5MB. A block of this size poses a significant strain on network bandwidth. Monad addresses this issue by ensuring that the proposals (issued by the leader every round) reference only the hash of transactions, not the transactions in their entirety, reducing bandwidth consumption (a transaction might be 500 bytes, but its hash is only 32 bytes). Because block proposals during the consensus process reference only the hashes, all validators must share the transaction mempool with one another. This way, even if the block proposal doesn't contain all the data, it can be effectively propagated.
Another unique aspect of Monad is its separation of transaction execution from consensus, making the consensus process more efficient. As we've learned about modular blockchains, computation and consensus are distinct processes. Consensus involves determining how to include transactions in a block, while execution pertains to the actual enactment of the transactions, resulting in a change of state. In Monad, while leaders and validators might vote on a proposal, they don't necessarily know how the transactions are executed.
Why did Monad separate computation and consensus? Let's discuss Ethereum first. Ethereum adopts a method where computation precedes consensus. In Ethereum's consensus process, one must 1) agree on the transactions in a block and 2) consent to the Merkle root resulting from executing these transactions. This means, in Ethereum, the leader has to compute all transactions before sharing the proposal, and other validators also must compute the transactions before voting on the proposal, adding complexity. In such cases, gas limits must be conservatively set, and the time for consensus becomes tight.
To address this, Monad separates and parallelizes computation and consensus: while nodes execute the transactions of block K, they simultaneously conduct consensus for block K+1. This way, execution merely follows consensus, allowing for an appropriate gas budget in line with block times. This separation is feasible because if the order of transactions is agreed upon by a majority of nodes, the outcome is essentially determined.
However, a question arises when computation and consensus are separated and processed in parallel: “If the order of transactions is set without updating relative values (since execution follows consensus), could transactions from users without gas be included, exposing the system to potential DDOS attacks?” To address this concern, the concept of Carriage Cost was introduced.
Carriage cost is, as the name implies, the 'transportation fee'. Given Monad's separation of computation and consensus, the way it charges for transactions is quite unique. Typically, the cost is incurred when executing a transaction, but Monad separates the execution cost from the transportation cost. If there's a fee required to transmit a transaction but no fee for its execution, the transaction is simply deemed a failure. This approach prevents users from continuously attempting to send transactions when they have insufficient funds.
Moreover, nodes establish a "Reserve Balance" for each account, setting aside a balance dedicated solely to transportation fees. The reason for creating a Reserve Balance is to ensure that only transactions that have paid the fee are included in a block.
So far, we've delved into the consensus mechanism of Monad. But how exactly does Monad process transactions? The execution strategy of Monad can be broken down into two main pillars: parallel transaction processing and MonadDb. Let's explore each of these in detail.
There are two types of transactions here, and I will name each transaction as follows:
Transaction A is a transaction where account A receives Monad tokens from account B.
Transaction B is a transaction where account A sends Monad tokens to account C.
If we imagine these two transactions being processed in parallel rather than sequentially (with Transaction B starting before Transaction A finishes), the balance of account A after this parallel processing might differ from its balance had the transactions been processed sequentially. This discrepancy could lead to a transaction execution error.
To address this issue, Monad employs the STM (Software Transactional Memory) approach and a method borrowed from OCC (Optimistic Concurrency Control) in STM. As the term OCC suggests, Monad assumes all operations to be valid during the parallel processing of transactions. It first initiates the execution and then, if any issues arise during the validation process, it re-executes the transaction. The outcome of this process should match the result as if the transactions were processed sequentially. While Monad processes transactions in parallel, the state values updated due to the transaction results are sequentially merged to verify whether the parallel-processed transactions are valid or not. In other words, instead of validating the relationships between transactions before parallel processing, Monad first processes the transactions and, if issues arise, it uses the information available at that moment to reprocess the transactions.
The reason for this approach is that re-executing transactions based on information available after an initial execution, rather than preemptively verifying the transaction relationships and then executing, ultimately proves to be more efficient.
Currently, Monad is researching methods to proactively handle transaction re-execution.
Before diving into MonadDb, it's crucial to understand a particular concept: Asynchronous I/O (often abbreviated as async i/o). Traditionally, when processing the input and output values of transactions, the system had to wait for the results before moving on. However, with async i/o, the CPU can process other transactions without waiting for results from a particular transaction. Unlike traditional Ethereum databases that don't support async i/o, MonadDb is built to support it, offering significant efficiency gains when processing transactions.
Source: Monad
Judging by its description, Monad is a blockchain that has required extensive technical deliberation. Accordingly, the Monad team is comprised of many technical experts. The CEO, Keone, for instance, is a technical leader who once spearheaded the High Frequency Trading (HFT) team at the renowned trading company, Jump Trading. Similarly, James was a developer at Jump Trading, responsible for building trading systems. Most of the developers working on the blockchain core come from backgrounds where they developed platforms optimized for low latency and rapid computation. This experience aligns significantly with Monad's vision.
Source: Monad Medium
In February of this year, Monad secured seed round funding of approximately $19 million from notable investors such as Dragonfly, Shima Capital, Finality Capital, and Credibly Neutral. A standout feature of this fundraising effort was the participation of influential angel investors in the blockchain space. Receiving investments from prominent industry leaders like Hasu, the Strategy Lead at Flashbot, and 0xMAKI, the creator of SushiSwap, drew significant attention to Monad.
The reason why Monad is drawing such attention is clear. It supports one of the strongest industry narratives - the Ethereum Virtual Machine (EVM). At the infrastructure level, it incorporates and refines advanced technologies such as DiemBFT and HotStuff into its unique MonadBFT. Additionally, it utilizes transaction parallel processing to ensure network scalability. This combination promises a large developer community and ample scalability and resources for them to operate within. While most rollup chains are presented as scalability solutions for Ethereum, the limited block space of Ethereum prevents them from guaranteeing high speeds. This suggests that the emergence of a high-performance monolithic blockchain supporting the EVM could disrupt the blockchain market landscape once more.
Of course, Monad still faces challenges. As mentioned, significant research is needed to streamline transaction parallel processing. Additionally, inherent issues with the EVM itself, particularly security concerns, remain unresolved. The fact that most new layer-1 monolithic blockchains have chosen virtual machines other than the EVM is evidence of this.
However, it is unlikely that the Monad team is unaware of these realities as they continue their blockchain development. There is great anticipation to see the influence that Monad will have on the monolithic blockchain market and the broader blockchain industry. As a fusion of cutting-edge technology and a powerful narrative, Monad could potentially be the driving force behind the resurgence of the layer-1 market.
Thanks to Kate for designing the graphics for this article.