
In this blog post, we clarify the storage model of the Internet Computer (IC) and provide some insights related to the roadmap for more storage.
We first outline what types of storage blockchain can typically provide, then detail the unique trade-offs achieved by the Internet Computer architecture, and finally outline the next milestones on the roadmap to implementing more storage.
In the context of blockchain storage, broadly speaking, two types of storage can be distinguished: fully replicated storage and distributed storage. The Internet Computer relies on fully replicated storage, where as part of the protocol, it ensures that all participating nodes store a complete copy of the data - commonly referred to as replicated state - therefore, this type of storage supports direct read/write/update/delete operations as part of any action agreed upon in a replicated manner by participating nodes via some consensus protocol.
From the perspective of smart contract developers, this type of storage feels very much like the RAM that is permanently available in traditional computer programs.
On the other hand, in distributed storage, the consensus protocol merely acts as a coordinator, deciding which subset of nodes stores which portion of the previously agreed data, which often means that not all participating nodes have to store all data, thus reducing the replication factor.
However, it is crucial to note that this also means that directly reading data during replicated execution becomes impractical, which is why this type of storage is primarily used for storing static blobs.
Therefore, while the fully replicated data model is clearly more powerful for building applications upon than the distributed storage model, it also faces scalability challenges.
The architecture of the Internet Computer features three concepts that uniquely address these scalability challenges and provide significant fully replicated storage capacity: deterministic decentralization, high-performance storage layer implementation, and the ability to scale by adding subnets.
We will now briefly discuss how they facilitate highly scalable fully replicated storage:
Deterministic Decentralization: The Network Nervous System (NNS) DAO makes informed decisions about which nodes join the network and which nodes will become part of the subnet. Thus, compared to a setting that allows any node to join the network or subnet, the total number of nodes in each subnet is much smaller, allowing for diversity and decentralization objectives.
High-Performance Storage Layer Implementation: Recently, as part of the Stellarator milestone project, the entire storage layer of the IC has been redesigned. Among other things, the new storage architecture is an important step toward achieving more fully replicated storage capacity for each subnet. At the launch of the Stellarator milestone project, the maximum storage capacity of each subnet has been increased to 1 TiB, but crucially, the new architecture also supports subsequent projects, thereby achieving more fully replicated storage capacity for each subnet.
Scaling by adding subnets: The NNS can launch new subnets as needed, allowing for new storage capacity to be added based on demand.
Due to these architectural characteristics of the Internet Computer, the focus has been on optimizing the capacity for fully replicated storage. In the remainder of this article, we will provide a more specific overview of the next steps for the Internet Computer's fully replicated storage capacity.
As of the time of writing, a single subnet can store 1 TiB (approximately 1.1 TB) of fully replicated storage. The Internet Computer currently has 34 subnets hosting dapps, which means the total replicated storage capacity is currently 34 TiB.
From 1 TiB subnet storage to 2 TiB
The design and implementation of the new storage layer of IC allows it to avoid performing time-consuming operations that grow linearly with the size of the replicated state, as its operations only depend on the amount of data that changes compared to the previous state. Thus, from this perspective, the Internet Computer is well-prepared to increase the maximum replicated state size of subnets.
In order to smoothly increase capacity to 2 TiB, there are some known factors that need to be investigated. However, other unknown factors may be discovered during the implementation process:
Nodes that join the subnet or fall behind other nodes in the subnet use a protocol called 'state synchronization' to fully catch up with the latest state of the subnet. Additionally, subnet recovery may involve certain nodes needing to synchronize the entire state. Benchmarking is needed to understand the performance of state synchronization under 2 TiB state and whether this performance is acceptable in all cases and/or whether optimization is needed.
Sometimes, nodes participating in the subnet need to hash the replicated state. Although this is done incrementally (i.e., only the differences from the previous state need to be hashed), there are also extreme cases where it is necessary to hash the entire state. We need to test to determine whether these cases are acceptable.
Optimistically, with extensive testing and some potential optimizations, the path to 2 TiB is navigable.
Subnet storage exceeding 2 TiB
A natural question is whether it is possible to further evolve. Unfortunately, exceeding 2 TiB is slightly more complex than reaching 2 TiB, mainly because under certain worst-case usage scenarios, scaling up may lead to the physical disks of nodes being filled up.
In particular, the way the new storage layer stores files on disk, as well as the fact that the protocol is quite conservative in retaining old states, means that there will be considerable overhead in terms of disk usage.
Therefore, to exceed 2 TiB, such as increasing to 4 TiB or even larger storage space, some modifications to the protocol are required. First, it is necessary to consider changing storage layer parameters to reduce storage overhead, which will certainly also affect execution performance. Secondly, the protocol needs to be modified to be more proactive when deleting old states.
Clearly, both measures require extra caution in design and implementation, along with extensive testing. Achieving this goal will also require revisiting all the points mentioned in the 2 TiB step and possibly making further improvements.
Therefore, we are still some way from this step, but we remain very confident that we will ultimately achieve this goal.
Internet Computer Distributed Storage Space
Finally, it is worth noting that while the Internet Computer has focused on providing vast amounts of replicated storage so far, there is no fundamental factor preventing the protocol from extending to additionally support distributed storage (e.g., blob storage), which will be discussed in another article later.
Conclusion
In recent years, the Internet Computer has continuously broken the limits of blockchain replicated storage capacity, which is crucial for many use cases in Web3 that were deemed impossible just a few years ago, one example being artificial intelligence, where large language models run entirely on-chain.
The Internet Computer has some unique architectural characteristics that allow it to surpass the current supported range in replicated storage, as mentioned, there are already some specific follow-up measures planned to further enhance the storage capacity of the Internet Computer.
Additionally, efforts to provide a second type of storage capable of storing static blobs on the Internet Computer will soon begin.

IC content that matters to you
Technological Advances | Project Information | Global Activities

Follow the IC Binance channel
Stay updated with the latest news