Introducing Storage Hub: A system parachain optimised for storage

For a long time now, the blockchain ecosystem has been aware of the need for Web3 native storage solutions. Blockchain technology is unprecedented for keeping a decentralised record of valuable information, like transactions and balances, but there are applications in the space with higher storage needs. Enter projects like Filecoin, Arweave, BNB Greenfield, among others.

However successful these projects have been, they tend to struggle in one particular aspect - they are disconnected from the rest of the blockchains, where the processing happens. DApp developers have to deal with two different solutions and be the ones to make the connection on the outer layer of the tech stack. That is where Polkadot shines: interconnection. A storage parachain on Polkadot can offer the unique value proposition of exposing its services to other parachains, which can in turn abstract dApp devs building on top of them from the existence of the dual solution.

At Moonsong Labs, we have dedicated considerable effort to conceptualising a parachain design that would deliver maximal value to the ecosystem. We have some strong opinions and we want to share them with the community, to get some valuable feedback. After all, the use cases for Storage Hub are as wide as the imagination of projects running on Polkadot.

Feel free to go down this rabbit hole, where we document our design proposal in depth, going over topics like the architecture of the system, game theory, incentives and storage proofs.

13 Likes

With crust, polkadot already has a storage parachain.
How does the system parachain relate to crust?

2 Likes

My thought as well. Although it wouldn´t hurt to have multiple approaches at least in the beginning to see what the best approach would be.

Related: Polkadot Native Storage

What does this solution provide that Crust Network does not?

Indeed, and thank you for bringing that up!There are a few key differences:

  • For starters, this would be a System Parachain. Therefore, DOT is used as native token and there is no CRU or similar. In fact one of the ideas here is to provide another demand driver or use-case for DOT.
  • It is my understanding that Crust provides a layer of incentives on top of IPFS. This solution aims to be a completely independent, i.e. it would provide its own network of Storage Providers. In fact it provides 2 kinds of Storage Providers with different business models, incentives and roles in the system.
  • One of our value propositions is to abstract users or dApp developers of the fact that they have to deal with two different blockchains when needing storage. Normally it is the dApp (the topmost layer of the stack) that connects the storage chain (like Filecoin or Arweave) to the smart contracts chain. We want to shift the paradigm, leveraging on Polkadot’s unique value proposition of interoperability. Storage Hub exposes its services to other parachains, and these, in turn, expose them to their users/developers. So now, as far as the users are concerned, they’re interacting only with the parachain they’re interested in, which under the hood uses Storage Hub for storing big files. We believe this highlights one of the key advantages of the Polkadot ecosystem.
  • This design is focused on providing the structure and incentives for storage solutions to a wide variety of use-cases, even those we don’t know of yet. And that is the reason behind the Main Storage Providers approach. If all Storage Providers in the system are required to provide the same service (for data retrieval), the system as a whole can, at most, serve the use-cases that the weakest Storage Provider can. Which means we either lower the bar on requirements for those use cases, or increase the hardware requirements for Storage Providers, essentially risking centralisation.

I hope this sheds some light on the topic. Very keen on everyone’s thoughts and happy to discuss further.

2 Likes

Hey, thanks for the question! I explain the key differences in this comment. Feel free to reply!

According to the W3F, two teams were given grants to come up with designs. The other team’s proposal looks incredibly thorough and we are excited to see the progress they make. Our approaches seem to have diverged, which we believe is good for decentralisation of development in the ecosystem!

3 Likes

Appears availability is not really a concern here?

@burdges you raise a good topic. If your question is whether or not the system cares about data availability, the short answer is yes, it cares. But it is a bit more complicated than that.

The complication stems from the fact that since files are off-chain, it is extremely cumbersome to have trustless retrievals. By trustless retrievals I mean that Storage Hub parachain should be able to know if a Storage Provider sent the data through an off-chain channel to the user that requested it, without trusting any of the two. We can request the Storage Provider to prove that it still stores a file in hot storage (and in fact we do that periodically, just like Filecoin, Arweave or any other), but that doesn’t mean that it is sending the data to the user. And I say “extremely cumbersome” and not “impossible” because we’ve actually come up with a very complicated way to do it:

But instead of severely damaging usability for retrieval with this algorithm, we’ve opted to separate the responsibilities of unstoppability and usability. Main Storage Providers bring in usabiliby by offering good value propositions for various retrieval use-cases (one value proposition of one Main Provider could be in fact high data availability, with low latency and high bandwidth, at a higher cost). They do not need to prove they behaved correctly and retrieved the data to the users off-chain, but are incentivised to do so, because the moment the user detects that it is not adhering to the service advertised, it can permissionlessly switch to another. Backup Storage Providers bring in unstoppability by storing the data with redundancy, but are not burdened with having good infrastructure for retrieval, high traffic or different protocols. They are only required to prove they store the data, and send it through peer-to-peer channels to another Main Storage Provider, if there is a change. That way, their implementation can be more lightweight and therefore decentralised, because the barrier of entry is lower.

So in short, the system cares about data availability, just that it has different responsibilities and incentives for Main and Backup Storage Providers. I hope I got your question right, and answered it accordingly. Let me know if you have more questions or want me to elaborate further on one aspect.

2 Likes

Here is an discussion on storage using ipfs/iroh, Error while installing on substrate-node-template · Issue #1804 · n0-computer/iroh · GitHub

More about Iroh:
At the technical level, Iroh is an experimental re-implementation of IPFS object storage focused on improving performance, reliability, scalability, and network efficiency.

Asked because I saw no mention of erasure coding. IPFS never worried about availability.

Afaik stronger availability always comes from the interaction between erasure coding and distribution, so erasure coding cannot live in too separate a layer from distribution choices.

At the same time, there should be good strong-ish availability regimes based upon fountain codes instead of reed-solomn, which wind up much weaker than the expensive strict 2/3rd honest one used by polkadot, or the aggressive sampling used by celestia, but still much stronger than simple replication like ipfs protocols, bittorrent, etc. Yet afaik nobody formulated one. It does matter to have the right storage abstraction as doing so may help to formulate this.

I’m curious how Iroh and/or ipfs compare with libtorrent (uTP) at raw transfer speed under different network conditions. I’d expect libtorrent blows them away under most conditions, but not under all conditions due to LEDBAT.