BNB Smart Chain Annual Storage Report 2024



Blog post image.

Introduction

In 2023, the BNB Smart Chain (BSC) maintained consistent traffic volumes, witnessing a notable increase in market activities due to inscriptions in December. These developments over the past year have significantly influenced BSC’s storage demands. In this report, we will learn:

  1. How do the storage statistics differ from the previous year?
  2. What phenomena cause the storage difference?
  3. The challenges faced and the proposed direction to resolve them

Storage Overview

All storage statistics are obtained by setting up a full node with Path-based Storage Scheme (PBSS) and PebbleDB synced to block 34840595, and were generated on 31st December 2023.

The following table shows an overview of the storage result:

DatabaseCategorySizeCount
Key-Value storeHeaders72.28MiB90009
Bodies12.40GiB90009
Receipt lists7.73GiB90009
Difficulties4.03MiB90009
Block number -> hash3.61MiB90007
Block hash -> number1.33GiB34840598
Transaction index176.04GiB5183543985
Bloombit index8.12GiB17426746
Contract codes20.23GiB2590028
Hash trie nodes0.00B0
Path trie state lookups3.52MiB90001
Path trie account nodes40.34GiB349647355
Path trie storage nodes473.95GiB4718092104
Trie preimages819.00B13
Account snapshot13.17GiB257244258
Storage snapshot246.98GiB3468787109
Clique snapshots0.00B0
Parlia snapshots100.79MiB34105
Singleton metadata401.62MiB17
Light clientCHT trie nodes3.39GiB33630011
Bloom trie nodes8.65GiB9334268
Ancient store (Chain)Bodies797.23GiB34750596
Receipts664.62GiB34750596
Diffs356.49MiB34750596
Headers20.21GiB34750596
Hashes1.23GiB34750596
Ancient store (State)Account Data1.52GiB90000
Storage Data1.63GiB90000
History Meta248.81MiB90000
Account Index2.03GiB90000
Storage Index3.65GiB90000
Total2.45TiB

The following visualization shows the storage distribution of each major component:

As shown, block data takes up the majority of the storage, followed by the world state and metadata. By comparing with the storage layout in December 2022, which was announced in BNB Smart Chain Annual Storage Report 2023, the summary is as follows:

  • The total storage size increased from 1.73TB (correction with ~130GB transaction index) to 2.45TB, a growth rate of 41.6%.
  • The storage capacity of each major storage component is shown below, and the growth rates are 42.6%, 42.5%, 42.9%, and 34.4% respectively.

Block Data

The following graph shows the year-over-year block data comparison:

In 2023, BSC saw a notable increase in its data storage requirements, particularly in block body sizes, which expanded by 256GB, marking a 46.4% growth rate. Additional components such as receipts, headers, and codes also experienced significant increases, growing by 185GB, 6.68GB, and 4.73GB respectively, with their growth rates standing at 37.95%, 49.1%, and 30.5%. This expansion pace represents a slowdown compared to 2022, attributed to the reduced transaction per second (TPS) in a bear market.

The substantial block size presents several challenges. One key issue is the necessity to store all blocks from the Genesis block to the most recent, consuming extensive disk space that will only continue to grow. However, executing the most recent blocks does not require access to historical block data. This situation presents an opportunity to explore optimization techniques that could potentially reduce the storage needs of a node by excluding this historical data.

Furthermore, the size of each block increases with higher transaction throughput. From the average block size and daily transaction number charts on BscScan, the average block size is around 40k-50k and the average TPS is around 44.  In December, the block size once reached 250k and the TPS reached more than 1k, which is consistent with the popularity of the entire crypto market. Higher TPS means larger block data size, which demands more disk bandwidth and larger disk space.

Exploring the database mechanics further, initially, recent blocks are stored in a key-value (KV) database. When these blocks age beyond a certain point, termed the ancient threshold, they are transferred to the ancient database. This transfer process, unfortunately, results in some disk bandwidth inefficiency. Additionally, it’s important to note the implications of EIP-4844. With the adoption of EIP-4844 by the BSC, an increase in block size is anticipated due to the incorporation of blobs. Although the storage required for blobs may not expand over time, it will nonetheless impose an additional demand for disk space on the part of node operators.

World State

Trie

Dec. 2022Dec. 2023Growth rate
EOA accounts87,190,393152,436,00174.8%
Contract accounts47,329,085104,809,811121.4%
Total KV pairs3,449,013,2095,068,274,29246.9%
Total Size360.68GB514.29GB42.6%

From the table above, we see a huge surge in the number of accounts, particularly for contract accounts which increased by 121.4%. This indicates a healthy growth and activity level within the BNB Chain ecosystem even during the bear market. However, this also leads to an increase in trie storage size with a growth rate of 42.6%. 

Diving deeper into the MPT composition, the following diagram shows the proportion of trie nodes on each trie level:

The deeper the nodes are in the trie, the longer the reading latency,  which may impact the node performance. Most trie nodes are concentrated in the 7th and 8th levels of the trie, which is still considered normal.

Snapshot

Dec. 2022Dec. 2023Growth rate
Account snapshot size6.71GB13.17GB96.3%
Storage snapshot size174.86GB246.98GB41.2%
Total KV pairs2,577,621,3323,726,031,36744.6%

Snapshot is a flat key-value representation of the trie. Hence, the increase in the number of accounts in the trie would also increase the account snapshot size.

Big contract accounts

The unbounded nature of contract size allows for a single contract to potentially grow as large, or even larger, than the entire account trie. In light of this, an analysis was conducted on “big contract accounts,” characterized by their extensive storage sizes, manifested through substantial KV pair volumes written by the contract.

These contracts, with their significant storage demands and complex, multi-layered MPT structures, could lead to storage amplification issues, adversely affecting node performance. Presented below is a table detailing the number and proportion of trie nodes for the top 20 contracts:

Contract Address Hash Total Trie NodesPercentage
0xe9dae3d797a6bf53395810df9d7048f18ac98f1bd211dc87dfad3532aa88d237 2926873276.203%
0xe3ee5c338fb03ba97621fbf6b62c153a7a9b3c4dc567d43368d31a1ae9a2d6b5 1279743892.712%
0xbe09a843e96d820323ffaac74f0f119734db1f158ac0d0d5b627ac7f3bcc82c2 974758662.066%
0x9944875b9e5ab4adbba2b96063da62b3027becaed0108d94caa199e447f3899b 893365331.893%
0xcbfc208cdd69e775207d3575299a371560c11e9896b0a4163c2b845a7d9700ff 815065221.727%
0xa2aea0f231dc891cdb73930caa95a9cc139c3a15aa82bdd058ed70f340639f03 649503091.376%
0xe9f236c88a4a8a733cdc8006ea8ea015b72d5af7ce2349c63fbf18d8e8caf967 514065381.089%
0xd97dd5b88bb7ee807775844477cb799dbe99670ce8b2c117353e135807c96749 506643261.074%
0xc874e65ccffb133d9db4ff637e62532ef6ecef3223845d02f522c55786782911 503601391.067%
0xd463275379920234d812dc6067bd870fd827f413d7522b5ea4fa1344b0f67e98 492062621.043%
0x4f0461659e231d1a2414365e75f957f73cf742123e96266b388f745e748e5cb5 463472630.982%
0x6d6171b4266182a5688e6c28a1b19b90ef55d7c9477b203ac2efc5c767268a21 425358270.901%
0x056c4f19188880933e0d07f50b427ecd7f0e76a51114ebe3009810fab290f238 420605180.891%
0x659dd7cc4344b94968d04d592683ceb1d3cf2c537d3a70f6008bbbcd9257ee91 386659700.819%
0xfe1c2c3bf003e59420de2a964984544a947ac6de636a2dedb89b689ab278b65e 365227940.774%
0xb391b79f572b5a9730880e7ce4da4a9f128b595f4ba8cc8c74cd195b50f6912e 329181720.698%
0xb23ca34dfccaab5e20e02f61e2d9f76422f560e5407906b35398e774c27b40ae 309199350.655%
0xf7c451c1298c0a97d0dfbe0a4bec252fd1544432b7f968ec6dabe904165d3f69 303328740.643%
0xca7707f73fe46dcd03ecacc1ba26184f023fd3281fdfecb67a08d576d101af9a 302438590.641%
27.356%

Since the database only stores the hash of the account address, it is not easy to obtain the original account address directly. We attempted to identify the original addresses of these large players and have listed the top 5 below:

Contract AddressTotal Trie NodesPercentage
XEN Crypto: bXEN Token (0x2AB0e9e4eE70FFf1fB9D67031E44F6410170d00e )2926873276.203%
CryptoMines Worker (0x6053b8FC837Dc98C54F7692606d632AC5e760488)1279743892.712%
PancakeSwap: Prediction V2 (0x18B2A687610328590Bc8F2e5fEdDe3b582A49cdA)974758662.066%
Shido – Shido Network (0xE71A487706A065aE0947576F8E591732360d39fb) 893365331.893%
Bomb Crypto:BHERO (0x30Cc0553F6Fa1fAF6d7847891b9b36eb559dC618) 815065221.727%

Future Development

Blockchain is highly IO-bounded. Higher transaction throughput means more disk bandwidth, and larger database size also affects the database performance and the overall system performance. 

Reasonable data storage solutions and utilization of the disk bandwidth are the keys to effectively improving overall system throughput. Below are some proposals and directions we can research based on the analysis that we’ve done on this storage report:

  1. Separated databases for block data and state dataBlock data is stored sequentially while state data is stored randomly in the database. Split database by data pattern will make disk bandwidth usage more reasonable and improve the whole performance.
  2. Segmented History Data MaintenanceIt can help resolve the problem of increasing history block storage on the BSC for validators and full nodes. They only need to maintain a limited range of blocks. 
  3. State expiry in contract level to reduce current world state sizeThe current world state data size is continuously increasing, which will impact the network’s performance. We need to build some strategy to keep it under control.  Some storage tries may be rarely or no longer used. These storage tries’ state data can expire to reduce the whole state size.
  4. Build a high-performance state database

Currently, the state data is constructed on MPT and stored in a generic store such as LevelDB. The index performance is not good enough and our team is working to find a new solution to solve it.

  1. Integrate state snapshot into trie database

The state snapshot is used to improve the execution performance and its persistent data overlaps with the trie database. Besides, both state snapshot and trie database have similar complicated and nasty recovery mechanisms to ensure recoverability after panic. So it’s beneficial enough to integrate state snapshot into trie database for better robustness and simplicity.

  1. Improve the performance of storage tries with huge KV pairsA storage trie with huge KV pairs will make too many levels of MPT, which will impact the access performance. 

Looking Forward

In 2023, BSC implemented the PBSS and PebbleDB to enhance the efficiency of blockchain state storage. As we move into 2024, the continuous and rapid growth of blockchain data presents a significant challenge for maintaining BSC’s performance. It is crucial for all stakeholders to collaborate in seeking innovative solutions to enhance BSC’s efficiency and cost-effectiveness. Together, let’s commit to making BSC more robust and sustainable.