Table of Contents
Introduction
In 2023, the BNB Smart Chain (BSC) maintained consistent traffic volumes, witnessing a notable increase in market activities due to inscriptions in December. These developments over the past year have significantly influenced BSC’s storage demands. In this report, we will learn:
- How do the storage statistics differ from the previous year?
- What phenomena cause the storage difference?
- The challenges faced and the proposed direction to resolve them
Storage Overview
All storage statistics are obtained by setting up a full node with Path-based Storage Scheme (PBSS) and PebbleDB synced to block 34840595, and were generated on 31st December 2023.
The following table shows an overview of the storage result:
Database | Category | Size | Count |
Key-Value store | Headers | 72.28MiB | 90009 |
Bodies | 12.40GiB | 90009 | |
Receipt lists | 7.73GiB | 90009 | |
Difficulties | 4.03MiB | 90009 | |
Block number -> hash | 3.61MiB | 90007 | |
Block hash -> number | 1.33GiB | 34840598 | |
Transaction index | 176.04GiB | 5183543985 | |
Bloombit index | 8.12GiB | 17426746 | |
Contract codes | 20.23GiB | 2590028 | |
Hash trie nodes | 0.00B | 0 | |
Path trie state lookups | 3.52MiB | 90001 | |
Path trie account nodes | 40.34GiB | 349647355 | |
Path trie storage nodes | 473.95GiB | 4718092104 | |
Trie preimages | 819.00B | 13 | |
Account snapshot | 13.17GiB | 257244258 | |
Storage snapshot | 246.98GiB | 3468787109 | |
Clique snapshots | 0.00B | 0 | |
Parlia snapshots | 100.79MiB | 34105 | |
Singleton metadata | 401.62MiB | 17 | |
Light client | CHT trie nodes | 3.39GiB | 33630011 |
Bloom trie nodes | 8.65GiB | 9334268 | |
Ancient store (Chain) | Bodies | 797.23GiB | 34750596 |
Receipts | 664.62GiB | 34750596 | |
Diffs | 356.49MiB | 34750596 | |
Headers | 20.21GiB | 34750596 | |
Hashes | 1.23GiB | 34750596 | |
Ancient store (State) | Account Data | 1.52GiB | 90000 |
Storage Data | 1.63GiB | 90000 | |
History Meta | 248.81MiB | 90000 | |
Account Index | 2.03GiB | 90000 | |
Storage Index | 3.65GiB | 90000 | |
Total | 2.45TiB |
The following visualization shows the storage distribution of each major component:
As shown, block data takes up the majority of the storage, followed by the world state and metadata. By comparing with the storage layout in December 2022, which was announced in BNB Smart Chain Annual Storage Report 2023, the summary is as follows:
- The total storage size increased from 1.73TB (correction with ~130GB transaction index) to 2.45TB, a growth rate of 41.6%.
- The storage capacity of each major storage component is shown below, and the growth rates are 42.6%, 42.5%, 42.9%, and 34.4% respectively.
Block Data
The following graph shows the year-over-year block data comparison:
In 2023, BSC saw a notable increase in its data storage requirements, particularly in block body sizes, which expanded by 256GB, marking a 46.4% growth rate. Additional components such as receipts, headers, and codes also experienced significant increases, growing by 185GB, 6.68GB, and 4.73GB respectively, with their growth rates standing at 37.95%, 49.1%, and 30.5%. This expansion pace represents a slowdown compared to 2022, attributed to the reduced transaction per second (TPS) in a bear market.
The substantial block size presents several challenges. One key issue is the necessity to store all blocks from the Genesis block to the most recent, consuming extensive disk space that will only continue to grow. However, executing the most recent blocks does not require access to historical block data. This situation presents an opportunity to explore optimization techniques that could potentially reduce the storage needs of a node by excluding this historical data.
Furthermore, the size of each block increases with higher transaction throughput. From the average block size and daily transaction number charts on BscScan, the average block size is around 40k-50k and the average TPS is around 44. In December, the block size once reached 250k and the TPS reached more than 1k, which is consistent with the popularity of the entire crypto market. Higher TPS means larger block data size, which demands more disk bandwidth and larger disk space.
Exploring the database mechanics further, initially, recent blocks are stored in a key-value (KV) database. When these blocks age beyond a certain point, termed the ancient threshold, they are transferred to the ancient database. This transfer process, unfortunately, results in some disk bandwidth inefficiency. Additionally, it’s important to note the implications of EIP-4844. With the adoption of EIP-4844 by the BSC, an increase in block size is anticipated due to the incorporation of blobs. Although the storage required for blobs may not expand over time, it will nonetheless impose an additional demand for disk space on the part of node operators.
World State
Trie
Dec. 2022 | Dec. 2023 | Growth rate | |
---|---|---|---|
EOA accounts | 87,190,393 | 152,436,001 | 74.8% |
Contract accounts | 47,329,085 | 104,809,811 | 121.4% |
Total KV pairs | 3,449,013,209 | 5,068,274,292 | 46.9% |
Total Size | 360.68GB | 514.29GB | 42.6% |
From the table above, we see a huge surge in the number of accounts, particularly for contract accounts which increased by 121.4%. This indicates a healthy growth and activity level within the BNB Chain ecosystem even during the bear market. However, this also leads to an increase in trie storage size with a growth rate of 42.6%.
Diving deeper into the MPT composition, the following diagram shows the proportion of trie nodes on each trie level:
The deeper the nodes are in the trie, the longer the reading latency, which may impact the node performance. Most trie nodes are concentrated in the 7th and 8th levels of the trie, which is still considered normal.
Snapshot
Dec. 2022 | Dec. 2023 | Growth rate | |
Account snapshot size | 6.71GB | 13.17GB | 96.3% |
Storage snapshot size | 174.86GB | 246.98GB | 41.2% |
Total KV pairs | 2,577,621,332 | 3,726,031,367 | 44.6% |
Snapshot is a flat key-value representation of the trie. Hence, the increase in the number of accounts in the trie would also increase the account snapshot size.
Big contract accounts
The unbounded nature of contract size allows for a single contract to potentially grow as large, or even larger, than the entire account trie. In light of this, an analysis was conducted on “big contract accounts,” characterized by their extensive storage sizes, manifested through substantial KV pair volumes written by the contract.
These contracts, with their significant storage demands and complex, multi-layered MPT structures, could lead to storage amplification issues, adversely affecting node performance. Presented below is a table detailing the number and proportion of trie nodes for the top 20 contracts:
Contract Address Hash | Total Trie Nodes | Percentage |
0xe9dae3d797a6bf53395810df9d7048f18ac98f1bd211dc87dfad3532aa88d237 | 292687327 | 6.203% |
0xe3ee5c338fb03ba97621fbf6b62c153a7a9b3c4dc567d43368d31a1ae9a2d6b5 | 127974389 | 2.712% |
0xbe09a843e96d820323ffaac74f0f119734db1f158ac0d0d5b627ac7f3bcc82c2 | 97475866 | 2.066% |
0x9944875b9e5ab4adbba2b96063da62b3027becaed0108d94caa199e447f3899b | 89336533 | 1.893% |
0xcbfc208cdd69e775207d3575299a371560c11e9896b0a4163c2b845a7d9700ff | 81506522 | 1.727% |
0xa2aea0f231dc891cdb73930caa95a9cc139c3a15aa82bdd058ed70f340639f03 | 64950309 | 1.376% |
0xe9f236c88a4a8a733cdc8006ea8ea015b72d5af7ce2349c63fbf18d8e8caf967 | 51406538 | 1.089% |
0xd97dd5b88bb7ee807775844477cb799dbe99670ce8b2c117353e135807c96749 | 50664326 | 1.074% |
0xc874e65ccffb133d9db4ff637e62532ef6ecef3223845d02f522c55786782911 | 50360139 | 1.067% |
0xd463275379920234d812dc6067bd870fd827f413d7522b5ea4fa1344b0f67e98 | 49206262 | 1.043% |
0x4f0461659e231d1a2414365e75f957f73cf742123e96266b388f745e748e5cb5 | 46347263 | 0.982% |
0x6d6171b4266182a5688e6c28a1b19b90ef55d7c9477b203ac2efc5c767268a21 | 42535827 | 0.901% |
0x056c4f19188880933e0d07f50b427ecd7f0e76a51114ebe3009810fab290f238 | 42060518 | 0.891% |
0x659dd7cc4344b94968d04d592683ceb1d3cf2c537d3a70f6008bbbcd9257ee91 | 38665970 | 0.819% |
0xfe1c2c3bf003e59420de2a964984544a947ac6de636a2dedb89b689ab278b65e | 36522794 | 0.774% |
0xb391b79f572b5a9730880e7ce4da4a9f128b595f4ba8cc8c74cd195b50f6912e | 32918172 | 0.698% |
0xb23ca34dfccaab5e20e02f61e2d9f76422f560e5407906b35398e774c27b40ae | 30919935 | 0.655% |
0xf7c451c1298c0a97d0dfbe0a4bec252fd1544432b7f968ec6dabe904165d3f69 | 30332874 | 0.643% |
0xca7707f73fe46dcd03ecacc1ba26184f023fd3281fdfecb67a08d576d101af9a | 30243859 | 0.641% |
27.356% |
Since the database only stores the hash of the account address, it is not easy to obtain the original account address directly. We attempted to identify the original addresses of these large players and have listed the top 5 below:
Contract Address | Total Trie Nodes | Percentage |
XEN Crypto: bXEN Token (0x2AB0e9e4eE70FFf1fB9D67031E44F6410170d00e ) | 292687327 | 6.203% |
CryptoMines Worker (0x6053b8FC837Dc98C54F7692606d632AC5e760488) | 127974389 | 2.712% |
PancakeSwap: Prediction V2 (0x18B2A687610328590Bc8F2e5fEdDe3b582A49cdA) | 97475866 | 2.066% |
Shido – Shido Network (0xE71A487706A065aE0947576F8E591732360d39fb) | 89336533 | 1.893% |
Bomb Crypto:BHERO (0x30Cc0553F6Fa1fAF6d7847891b9b36eb559dC618) | 81506522 | 1.727% |
Future Development
Blockchain is highly IO-bounded. Higher transaction throughput means more disk bandwidth, and larger database size also affects the database performance and the overall system performance.
Reasonable data storage solutions and utilization of the disk bandwidth are the keys to effectively improving overall system throughput. Below are some proposals and directions we can research based on the analysis that we’ve done on this storage report:
- Separated databases for block data and state dataBlock data is stored sequentially while state data is stored randomly in the database. Split database by data pattern will make disk bandwidth usage more reasonable and improve the whole performance.
- Segmented History Data MaintenanceIt can help resolve the problem of increasing history block storage on the BSC for validators and full nodes. They only need to maintain a limited range of blocks.
- State expiry in contract level to reduce current world state sizeThe current world state data size is continuously increasing, which will impact the network’s performance. We need to build some strategy to keep it under control. Some storage tries may be rarely or no longer used. These storage tries’ state data can expire to reduce the whole state size.
- Build a high-performance state database
Currently, the state data is constructed on MPT and stored in a generic store such as LevelDB. The index performance is not good enough and our team is working to find a new solution to solve it.
- Integrate state snapshot into trie database
The state snapshot is used to improve the execution performance and its persistent data overlaps with the trie database. Besides, both state snapshot and trie database have similar complicated and nasty recovery mechanisms to ensure recoverability after panic. So it’s beneficial enough to integrate state snapshot into trie database for better robustness and simplicity.
- Improve the performance of storage tries with huge KV pairsA storage trie with huge KV pairs will make too many levels of MPT, which will impact the access performance.
Looking Forward
In 2023, BSC implemented the PBSS and PebbleDB to enhance the efficiency of blockchain state storage. As we move into 2024, the continuous and rapid growth of blockchain data presents a significant challenge for maintaining BSC’s performance. It is crucial for all stakeholders to collaborate in seeking innovative solutions to enhance BSC’s efficiency and cost-effectiveness. Together, let’s commit to making BSC more robust and sustainable.