Proposal: Dynamic storage pallet

davidk-pt · December 7, 2024, 11:28am

I looked through all the pallets in frame and didn’t find anything similar, so I’m offering to build dynamic blob storage pallet which would allow low level byte modifications.

Motivation:

Allow more useful user facing applications to be built using blockchain as a storage
Offload all the complex parts about low level storage management details to third party libraries, i.e. SQLite and app developers
Bring more developers who are familiar with traditional Web2 technologies to build on Polkadot
Minimize complex interaction with the pallet from application developer side, only two functions to change the data and delete blob

Other proposal I’ve seen suggests storing any binary blobs off-chain not to bloat the blockchain.
Approach presented here is simple, and could be implemented quickly with what we already have, in a month or two.
Assuming there will not be many users at the beginning, developers could already be building a lot of useful applications, basically stating “blockchain is not just for DeFi, money operations and memecoins” while the sophisticated and more complex advanced storage solution is eventually built.

The pallet would have one storage map

    #[pallet::storage]
    pub type Storage<T: Config> = StorageDoubleMap<
        _,
        Twox64Concat,
        String, // - user account + blob name hash
        Twox64Concat,
        u64, // - page id
        (Consideration, BoundedVec<u8, T::PageSize>),
    >;

There could be storage map for permissions of accounts who also could be permitted to modify the blob state.
We would have Consideration of each page so the parachain token has utility

We could provide these pallet calls for the user:

    pub fn mutate_blob(
        origin: OriginFor<T>,
        blob_name: String,
        encoded_instructions: Vec<UpdateInstruction>
    ) -> DispatchResult

Mutate blob call would receive encoded instructions to update the state, this would be update instruction enum

    enum UpdateInstruction {
        WritePages {
            start_page: u64,
            pages: BoundedVec<
               BoundedVec<u8, T::PageSize>,
               T::MaxPageUpdates
            >,
        },
        DeletePages {
            start_page: u64,
            end_page: u64,
        },
    }

The other call would be to delete the data allowing user to receive his consideration back for the storage used. If blob becomes very large over time, this could possibly be called multiple times, with each delete deleting up to some fixed amount of pages.

    pub fn delete_blob(
       origin: OriginFor<T>,
       blob_name: String,
    ) -> DispatchResult

From frontend developer’s perspective, we could write a library that fetches the current sqlite state from the blockchain, frontend developer can do arbitrary updates on the database, migrate sqlite schemas and all we would need to do is detect dirty pages and send the transaction to apply the diff on the blockchain.

If change set is too big to apply to one transaction user could submit partial updates to the storage with as many transactions as needed, and rely on SQLite’s built in safety guarantees not to corrupt the database, i.e. as long as we write to our virtual file system in the same order Sqlite does, sqlite database wouldn’t corrupt from a partial uncommitted state and discard partial writes and user could try again without worrying about database corruption.

To detect dirty pages we could hook into the shim of sqlite vfs to see what writes to the database it does The SQLite OS Interface or "VFS"

Also, we could adopt VFS interface, that even if database becomes quite large on chain, users using smoldot on the frontend can still quickly retrieve data because sqlite would figure out the least amount of pages we need to read to satisfy each query.

User would be paying only the delta storage size and sqlite now could manage the complex logic of where to store and mutate the database, while we don’t need to provide specific storage items in the pallet for specific domain tables. Say, storage item for user todos, storage item for user calendar and etc. etc. and we don’t need to worry about migrations of those items as they would be offloaded to the Dapp developer.

Of course, in the library we could have an option so that data stored is encrypted under symmetric key encrypted by user’s public key. User cold also choose with whom he shares his data by sharing symmetric key to the blob storage.

This will allow to bring in developers who mainly know Web2 technologies to easily develop powerful user applications mostly without touching or making user aware that it is running on a blockchain.

With SQLite App developer could perform drastic changes to the database state but we don’t need to implement many different complex calls to alter the user data and care about their migrations.

Eventually we could also build deterministic SQLite layer implementing VFS so that web assembly code could write queries interacting with a database, but for now even if sqlite is running only on the frontend side it already allows implementing a lot of every day apps easily.

The only dependencies to change are:

implement dynamic storage pallet
implement library for using blockchain based SQLite database

We could have a common good parachain dedicated to this, with its own storage token.
With storage token we could:

Limit the entire state storage size, say, if we release n tokens and they’re all used up for storage consideration blockchain state would never take more than, for example, 100GB
Token would have utility directly related to storage. Anyone only reading the database wouldn’t even need to use the token.
If we see that users need more storage overall, we could gradually release more of the token expanding total state size of the blockchain
After proof of personhood is implemented, small amount of storage could be just granted to the user, so they could use apps for free with reasonable data limit

Example of non financial every day applications off the top of my head this would allow developers to build:

Todo lists - database will be small
Calendars - database will be small
Weight loss, calorie counting apps - database will be small
Search engines - Sqlite has fts5 full text search capability, it would figure out minimal pages to fetch with appropriate indexes
Soundcloud equivalent, users could upload their own library of music readable by anyone, so if someone is running a smoldot on their phone they could fetch mp3 files stored in sqlite database and just listen music on their car for free

Developers wouldn’t need to worry about running their own databases and servers to store data.

Other options and their drawbacks for user facing apps:

EVM Smart contracts: they can only be deployed once, so if you build a todo list in a smart contract you can’t change them, or you need to juggle with proxy contracts and etc. Plus, users must do transaction per every operation, add todo, remove todo unless complex batching is implemented. Here we’d get batching for free done by the frontend developer, he’d just need to upload changes to the database back to the blockchain.
Parachains/pallets: it is not realistic to build a parachain or a pallet dedicated for a todo list, it is a steep learning curve and someone who would build the app with Web2 technologies easily would never go to such lengths to build a runtime and deploy it for something as simple as a todo list. Plus, if user wants to change schema of todo lists app, he’d need to know about writing frame migrations, which also have a steep learning curve.
Web assembly smart contracts: I imagine similar drawbacks with EVM smart contracts, I’m not familiar with those yet so someone could elaborate more

wil · December 9, 2024, 4:22pm

Might be worth looking at how the Stateful Storage pallet is setup on Frequency: GitHub - frequency-chain/frequency: Frequency: A Polkadot Parachain

It along with the Schemas Pallet holds structured data for Frequency Users

bkchr · December 9, 2024, 8:59pm

Blockchains are not made for storing data. This would be too expensive. Remember that every node in the network holds the entire state. Blockchains are build for reaching consensus. You can store for example the hashes of your offchain data on chain, but not the data itself.

xlc · December 10, 2024, 3:11am

That is basically using the chain as DA layer for a non-chain rollup. All the same issue applies (centralized sequencer, how to handle dispute, etc).

And guess what, JAM will solve your problem. You can literally run a server in CoreVM running SQLite in PVM. Although we need to measure if SQLite is the right choice (compare to just in-memory objects)

davidk-pt · December 10, 2024, 8:33am

Well, I believe blockchains store accounts and their balances at a minimum, so they do store data. Howbeit, it is today mostly done in a format that needs to be learned from scratch by most of the software developers out there.

IMO the purpose of the blockchain is whatever the end user needs, satisfied without a centralized authority, because at the end of the day, if a blockchain doesn’t serve the purpose of the end users it will never be widely adopted to begin with.

And yes, I thought about not having too much state stored in the blockchain, that’s why I said it can be capped with consideration and a total market cap so it doesn’t get too big in total state size.

davidk-pt · December 10, 2024, 8:34am

Sounds good if that can be easily done with JAM once it comes out

Although I don’t agree with the centralized sequencer issue, if sequencer is the user and he’s marking his own todo’s as done, then he doesn’t affect anyone else and only he can modify his own state.

Even if there are conflicts, if they’re modifying the same state at once, sqlite in this case would handle partial writes from any party to prevent corruption.

davidk-pt · December 10, 2024, 8:35am

Nice, someone already built a pallet doing exactly what I spake about here frequency/pallets/stateful-storage/src/lib.rs at 2e2cc3f88b222ada83ee236995a25a7d741867e7 · frequency-chain/frequency · GitHub so surely someone already needed this

Topic		Replies	Views
Introducing Storage Hub: A system parachain optimised for storage Ecosystem	13	1629	February 25, 2025
Permissioned pallet-contracts deployment to Asset Hub for DeFi primitives Ecosystem	18	2035	February 15, 2024
Bringing MoveVM to Substrate and the Polkadot ecosystem Tech Talk	13	1482	January 6, 2025
Parachain Technical Summit - Next Steps Tech Talk roadmap , frame , polkadot-summit	26	4951	July 17, 2024
Discussion - Highest Priority Pallet Functionality Ecosystem	4	464	July 5, 2023

Proposal: Dynamic storage pallet

Related topics