Table of Contents
Introduction
Operating a full node is anything but a straightforward “hello world” endeavor; it is a responsibility that requires ongoing vigilance. Until now, offline pruning has proven a notorious headache for node operators everywhere.
Path-Based Storage Scheme (PBSS), as introduced in Geth V1.13.0, is poised to fix these concerns, greatly enhancing performance and the user experience.
In this blog, we will learn:
- Why Binance Smart Chain (BSC) adopted PBSS
- The key improvements led by PBSS and how it compares with Hash-Based State Scheme (HBSS)
- The optimizations we introduced in BSC’s PBSS and our testing benchmarks
Why Did BSC Adopt PBSS?
Before PBSS, both ETH and BSC grappled with the inefficiencies of the hash-based storage scheme. This discrepancy is particularly noticeable on BSC, given its 3-second block creation time (compared to ETH’s 12 second time) and 140 million gas limit – 4.6 times larger than that of ETH.
Eventually, this leads to higher storage pressure and degrading performance, forcing users to halt their nodes for offline pruning. Offline pruning is a feature exclusive to the hash-based state scheme introduced in Geth version 1.10.
PBSS, introduced in BSC’s V1.3.1-beta release, addresses these concerns by introducing an online state prune feature, reducing state bloat without the necessity for offline operations. PBSS controls storage size, providing a seamless user experience without the hassle of offline pruning. In addition to online state pruning, BSC has experienced a remarkable performance boost, with the average time taken to import a block decreasing from 698ms to 531ms, a reduction of approximately 24%.
Path-Based vs Hash-Based
PBSS uses a path-based storage model, storing trie nodes with encoded paths, whereas the older hash-based model relies on the content hash of trie nodes. Inline state data pruning and efficient storage of trie nodes in the path-based model enhance performance.
In hash mode, the state database (DB) size expands quickly, growing by approximately 50GB weekly. In contrast, in path mode, it exhibits slower growth, increasing by less than 5GB each week.
Figure 1: State data growth on Path Mode vs Hash Mode
Innovations in BSC’s PBSS
BSC is committed to innovation, yet managing a substantial volume of transactions on its EVM presents a unique challenge. To address this, specific adjustments and optimizations, tailored to the differences between ETH and BSC, have been implemented to tackle the following issues:
- Increased generation of empty blocks when operating as validators under high transaction volumes (over 1,000 TPS)
- Significant performance variations in the face of large transaction volumes
Async Node Buffer
Innovation: Unlike Ethereum’s synchronous node buffer process, which requires pausing the main node workflow to complete disk flushing and cache synchronization when changes occur during block creation, BSC utilizes an asynchronous node buffer strategy. Ethereum’s more leisurely 12-second block time allows for such a synchronous approach.
BSC, on the other hand, with its rapid 3-second block interval, cannot afford the delays that come with synchronous disk operations. For optimal efficiency and to prevent any disruptions in block production, BSC has implemented an asynchronous buffer within the disk layer that processes flush operations in the background, thereby ensuring continuous and prompt block generation.
Figure 2: synchronous node buffer strategy in action
The asynchronous node buffer streamlines the flushing process like this:
- The most dated differential layer commits to the disk layer synchronously. However, the system does not halt for the node buffer to flush to disk, even when it exceeds the preset size limit.
- The node buffer associated with the disk layer performs database flushing in the background, enabling other processes to proceed without delays. Background flushing operations are designed to ensure the safety of transactions, safeguarding the integrity of the state data within the database.
- The database maintains a full snapshot of the state data corresponding to a specific historical block height, ensuring data consistency and availability.
This idea has also been proposed to the Ethereum community and is currently under review.
Trie Node Cache Optimization
Innovation: Adjusting the trie node cache to improve stability and reduce unexpected performance variations.
The asynchronous node buffer enhancement in the disk layer described above increased the frequency of flush disk to the database. This can increase the chance of database compaction.
Database compaction, a process that merges and rewrites data to optimize storage and enhance read performance, can be resource-heavy and potentially slow down systems. The BSC addresses this by doubling the buffer size in the disk layer, thereby reducing the frequency of compactions. A larger buffer holds more data before necessitating a write or flush to the database.
This strategy decreases the number of write operations and, consequently, the instances of data overwriting, a common trigger for compaction. By doing so, it effectively reduces the overall need for compaction operations.
Additionally, BSC adopts a memory management strategy that includes a clean cache twice the size of the node buffer cache. This approach reduces the risk of data overwrites during synchronization between the node buffer cache and the clean cache.
When the node buffer commits changes to the disk, it does so through the clean cache. Allocating more memory to the clean cache ensures that essential data remains easily accessible, which is crucial for maintaining the network’s high throughput and low latency.
Disk Bandwidth Optimization
Innovation: Proposing the use of optimized PebbleDB (an open source project) (an open source project) to regulate flushes and compactions, preventing spikes in latency.
BSC PBSS replaces LevelDB with the optimized Pebble DB. LevelDB operates without a throttle mechanism for flushes and compactions, consistently running at maximum speed and leading to notable latency spikes for both write and read operations.
On the other hand, PebbleDB employs separate rate limiters for flushes and compactions. This mechanism ensures operations occur only as fast as necessary, preventing unnecessary strain on disk bandwidth.
PebbleDB was adopted in go-ethereum as well, while BSC has spent a significant amount of time testing with PebbleDB to fine tune the configurations according to BSC traffic pattern.
Testing Benchmarks
We have coordinated several validators with the same binary (Geth V1.3.7) and hardware configuration for the performance comparison testing. One operates in hash mode using LevelDB, while the other operates in path mode using PebbleDB.
Hardware configuration for these piloted validators:
- AWS im4gn.4xlarge EC2 instance
- 16 cores of CPU and 64 GB of memory (RAM)
- 7T Nitro SSD
The following is the testing result from December 15, 2023 to January 15, 2024.
Index\Mode | Hash+LevelDB | Path+PebbleDB | Ratio(%) |
---|---|---|---|
import block (ms) | 698 | 531 | -24% |
chain execution(ms) | 397 | 363 | -8% |
chain validation(ms) | 191 | 135 | -29% |
chain commit(ms) | 84 | 47.3 | -43% |
disk read(MB/s) | 188 | 280 | +49% |
disk write(MB/s) | 16 | 26.4 | +65% |
The overall performance improvement with the path-styled state scheme is evident. This approach utilizes more disk bandwidth, and PebbleDB plays a role in mitigating spikes.
Looking Forward
PBSS can leverage its storage model’s features to implement online state pruning functionality and achieve performance enhancements. However, the state data of BSC is still experiencing continuous expansion. The performance and storage of MPT (Merkle Patricia Tree) are both challenged by state bloat. Some other solutions such as state expiry should be considered in future.
Recently, Ethereum co-founder Vitalik Buterin also suggested increasing the gas limits of ETH. PBSS can be leveraged by other EVM-compatible chains and empowers them to work together to find new solutions to make the overall EVM ecosystem more efficient and cost-effective.