On Thursday, November 21, 2024, the Sui Network experienced its first network outage. The cause was traced to an error in the congestion control system's 'TotalGasBudgetWithCap' mode, where transaction complexity calculations incorrectly set costs to 0, and an assert! statement in the code determined that execution costs should not be zero.
As a result, the Sui Network experienced its first-ever network outage and was unable to produce blocks for approximately 2.5 hours.
A network outage is never good news. Some may have lost trust due to this incident. Given that Sui has been the blockchain network showing the most meaningful progress this year, we hope they can turn this crisis into an opportunity and make an even greater leap forward.
Sui has had such a remarkable year that it can be considered the market leader of the Layer 1 market in 2024. In terms of TVL (Total Value Locked), it ranks second among Non-EVM chains after Solana, and has achieved impressive milestones including exceeding $2B in 7-day on-chain transaction volume. However, on Thursday, November 21, 2024, between 1:15 and 3:45 am PT, the Sui network, which had maintained 100% uptime until then, experienced its first complete network outage.
The technical details of the incident are as follows:
The issue occurred in the network's congestion control system using the 'TotalGasBudgetWithCap' mode
This mode was initially enabled in protocol version 63, reverted, and then re-enabled in version 68
System crashes occurred when all of the following conditions were met:
Congestion control was configured to use 'TotalGasBudgetWithCap' mode
The network received a transaction with both: 1)
A mutable shared object input and 2)
Zero MoveCall commands
When these conditions were met, the execution cost was erroneously calculated as zero, triggering an assert! statement that caused all validators to crash
1.1.1 Why Congestion Control is Necessary Sui's unique object-based architecture enables massive parallel transaction processing that differentiates it from other networks. However, when multiple transactions need to access the same shared object (e.g., multiple users trying to interact with the same NFT), they must be processed sequentially. The congestion control system was designed to manage this sequential processing and prevent system overload.
Fix implemented through PR #20365
Deployed as v1.37.4 for Mainnet and v1.38.1 for Testnet
Thanks to the validator community's swift response, the network was restored within just 15 minutes after the fix was deployed
However, a significant portion of the total 2.5-hour outage was spent waiting for the release build to complete
1.2.1 Key Learnings and Improvements:
The incident detection system worked effectively, with automated alerts immediately notifying on-call engineers
The team is strengthening testing systems to better handle adversarial transactions
Work is being done to improve build workflows for faster debug and release binary availability
As a result, despite being Sui's first network outage, the quick resolution and transparent communication through the detailed postmortem demonstrated the network's resilience and technical capabilities in crisis management. Currently, the Sui network has successfully resolved what they hope will be their first and last network outage and is operating normally.
Due to the swift network recovery despite this being Sui's first outage, the community and market reactions have been relatively positive. The Solana community, which has experienced numerous network outages, even sent encouragement to the Sui community.
The Sui network had taken great pride in maintaining 100% uptime until now, but they can no longer make this claim. Although the market nowadays seems less concerned about network downtimes, there will inevitably be a difference in trustworthiness between a network that has maintained 100% uptime and one that has experienced a downtime. Therefore, it's reasonable to assume that this incident has somewhat damaged trust in the network. Consequently, rebuilding this trust will be a crucial challenge for the remainder of the market cycle.
Nevertheless, the reason why this incident hasn't critically impacted the price is likely because most Sui enthusiasts are either current or former Solana supporters who have witnessed Solana recover from multiple network downtimes, demonstrating strong resilience.
The first step in restoring market confidence was "rapid recovery," and the collaborative efforts of the Sui Foundation, validators, and other contributors in restoring the network to normal operation within 2.5 hours is highly commendable. However, what's more crucial now is how Sui can maintain the momentum it has built throughout the year even after this network outage.
Sui has been leading the blockchain industry with various infrastructure initiatives. Notable examples include the Move language and object-based storage model, which were highly innovative approaches. Additionally, they successfully enhanced network performance through the Mystique upgrade and continue to research what's needed for blockchain technology to achieve mainstream adoption, as evidenced by preparations for storage layers like Wallus. However, to maintain its current momentum, Sui needs something more. I believe this could be SuiPlay0X1. If they can successfully establish a gaming ecosystem based on convenient hardware, they could leverage this to regain and build upon the momentum that has driven Sui throughout the year.
Ultimately, I view this network outage as a "blessing in disguise." As the saying goes, "better to take a hit early," it's preferable for such issues to occur sooner rather than later, allowing preventive measures to be implemented before the network handles larger-scale assets.
Blockchain is an extremely complex system, particularly networks supporting parallel processing. Perhaps it was more dangerous to assume that such complexity could be pursued without any network interruptions. Moving forward, as much attention must be paid to network security as is given to achieving high scalability through complex blockchain design.
Seems like Sui has been down for an hour now. I'm amazed any blockchain even turns on tbh, shit is so complex.
Sui's downtime reminds me of Solana, which went down countless times in 2021-2022. Now Solana is king, so evidently downtime wasn't fatal. In fact, the many outages were arguably net positive in the long run, since they provided incredible exposure opportunities. Every Solana outage brought waves of new spectators, even becoming a meme. You can't pay for that kind of coverage. ("No such thing as bad press.")
Going through downtime is painful, but it's a growing pain that all high performance blockchains go through. Many will learn about Move, Mysteceti, and Pilotfish for the first time today. And Sui will come out stronger on the other side.
Related Articles, News, Tweets etc. :